AD-  752  2)1 


THE  METHOD  OF  LEAST  SQUARES  AND  SOME 
ALTERNATIVES 

H.  Leon  Harter 

Aerospace  Research  Laboratories 
W  righ  t  -  Pa  tte  r  s  on  Air  Force  Rase,  Ohio 

September  1972 


National  Technical  Information  Service 
U.  S.  DEPARTMENT  OF  COMMERCE 

5285  Port  Royal  Road,  Springfield  Va.  22151 


J 


m  72-0129 
SEPTEMBER  1972 


THE  METHOD  OF  LEAST  SQUARES 
AND  SOME  ALTERNATIVES 


H.  LEON  HARTER 

APPLIED  MATHEMATICS  RESEARCH  LABORATORY 


PROJECT  7071 


Approved  for  public  release;  distribution  unlimited. 


v  i 


Reproduced  by 

NATIONAL  TECHNICAL 
INFORMATION  SERVICE 

U  S  Department  of  Commerce 
Springfield  VA  22151 


s\  t 


AIR  FORCE  SYSTEMS  COMMAND 


S£at#SjJfclr  Fmii 


r 


i  in  ■yjfc*  |,*”w 


.  "  .  NOTICES 

When  Government  drawings,  specifications,  cr  other  data  ar  e 
used  for  any  purpose  other  than  in  connection  with  a  definitely  related 
Government  procurement  operation,  the  United  States  Government 
thereby  incurs  no  responsibility  nor  any  obligation  whatsoever;  and 
the  fact  that  the  Government  may  have  formulated,  furnished,  or  in 
any  way  supplied  die  said  drawings,  specifications,  or  other  data,  is 
not  to  be  regarded  by  implication  or  otherwise  as  in  any  manner 
licensing  the  holder  or  any  other  person  or  corporation,  or  conveying 
any  rights  or  permission  to  manufacture,  use,  or  sell  any  patented 
invention  that  may  in  any  way  be  related  thereto. 


Agencies  of  the  Department  of  Defense,  qualified  contractors,  and 
ether  Government  agencies  may  obtain  copies  from: 

Defense  Documentation  Center 
Cameron  Station 
Alexandria,  VA  22314 


This  document  has  been  released  (for  sale  to  the  public)  to: 


Aerospace  Research  Laboratories  unless  return  is  required  by  so  ;>  itv 
conside.  ations,  ontractual  obligations,  or  notices  on  a  spc<  ifi<  doom  er.t. 


Alrt  FORCE/56730/26  October  1 172  -  500 


SSSB^gSSS! wmaancMrffitis' 


^CLASSIFIED 


Sfrciirfty  Cl* s*i:ication 


DOCUMENT  CONTROL  DATA  •  R  &  D 


fS+<untr  ctsaaiflcmtton  of  hits.  body  of  mbalrmi-t  mod  indexing  tmnoftson  roust  be  trttrr+a  w/-«n  fit*  ormtmll  report  la  clmaalUed) 


1  RC^OBT  TlTC  £ 


The  Method  of  Least  Squares  and  Sene  Alternatives 


4  OESCRlFTtvC  NO^es  {Typo  of  repoet  ani  h*cfoetee  detas' 


Scientific 


Interim 


9  AUTHOK15I  (Fitat  MOM,  't tiddS*  Initial,  test  ns  ms) 

F.  Leon  Harter 


«.  report  date 


September  1972 


j  7#.  TOTAL  NO  OF  PACES  76.  NO  OF  RTFS 

i  -253  5.^ k  4*7 


«*  CONTRACT  OR  GRANT  NO 

Internal 

b.  PROJECT  NO  7071-02-11 
»  DoD  Element  6U02F 
«■  DoD  Subeleoent  681304 


10  OISTWIBUTlON  STATEMENT 


1  9*.  ORIGINATOR'S  REPORT  NUMPER(S) 


AF.L  TR  72-0129 

3b  C  HER  REPORT  NO(S)  (Arty  other  ntgrbers  tfimt  mey  bo  maalfned 
i  i*  rtpctl) 


Approved  for  public  release;  distribution  unlimited. 


M  SUPPLEMENT  ARY  NOT  ES  12  iPONJCRING  MILITARY  A1TIVIT 

r-nrr-r,  Aerospace  Research  Laboratories  (LB) 

TbQi  ultifcR  Air  Force  Systems  Ccmncnd 

Wright -Patterson  AFE,  Ohio  4S433 


11  ABSTRACT 

/  very  important  problem  in  mathematical  st*v«stics  is  that  of  finding  the  best 
linear  or  nonlinear  regression  equation  to  express  the  rclaticn  between  a  dependent, 
variable  and  one  or  more  independent  variables .  Given  are  observations,  each  subject 
to  random  error,  greater  in  number  than  the  parameters  in  the  regression  equation,  on 
the  dependent  variable  and  the  related  values  of  the  independent  v®riaMe(s),  which  nay 
be  known  exactly  or  nay  also  be  subject  to  randan  error.  Related  problems  are  those 
of  choosing  tfce  best  neasiev.-;  of  central  tendency  and  dispersion  of  the  observations. 
The  best  solutions  of  all  three ..problems  depend  upor.  the  distribution  of  the  rasden 
errors.  If  one  assumes  tliat  the  values  rf  Lie  independent.  variable(.,)  »ro  known  exa'-~ 
ly  and  that  the  erron  in  the  ohsc.vativ.is  on  the  dependent  variable  are  normally 
distributed,  then  it  is  well  know'T'that  •  the  mean  is  the  best  nsssn-e  of  central 
tendency,  the  standard  deviation  is  the  best  measure  of  uiipersion,  and  the  method  of 
least  squares  is  the  bts*  method  of  fitting  a  regression  equation.  Other  assumptions 
lead  to  different  choices.  Most  practitioners  have  tended  to  make  Lie  assumption  of 
normality  and  not  to  worry  about  the  consequences  ydien  it  is  not  justified.  Another 
problem  arises  when  the  data  arv  contaminated  by  spurious  observations  (outliers) 
which  came  rrac  distributions  with  d-.'fensnt  means  and/or  larger  standard  deviations. 
Many  methods  have  been  p; -.posed  for  rejecting  outliers  or  modifying  then  (or  tnoir 
weights)  After  z  linearizing  (chronologically)  the  vol’minous  literature  on  measures  of 
central  tendency  and  dispei^n,  the  nethed  of  le#u>  squares  and  numerous  alternatives 
the  treatment  of  outlier?,  and  robust  estimation,  the  author  recommends  a  single  and 
reasonably  robust  >~t  of  procedures. 
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ABSTRACT 

A  very  important  problem  in  mathematical  statistics  is  that  of  finding  the 
best  linear  or  nonlinear  regression  equation  to  express  the  relation  between  a 
dependent  variable  and  one  or  more  independent  variables.  Given  are  observations, 
each  suoject  to  random  error ,  greater  in  number  than  the  parameters  in  the  re¬ 
gression  equation,  on  the  dependent  variable  and  the  related  values  of  the 
independent  variable (s) ,  which  may  be  known  exactly  or  may  also  be  subject  to 
random  error.  Related  problems  are  those  of  choosing  the  best  measures  of  central 
tendency  and  dispersion  of  the  observations.  The  best  solutions  of  all  three 
problems  depend  upon  the  distribution  of  the  random  errors.  If  one  assumes  that 
the  values  of  the  independent  variable (s)  are  known  exactly  and  that  the  errors 
in  the  observations  on  the  dependent  variable  are  normally  distributed,  then  it 
is  well  known  that  the  mean  is  the  best  measure  of  central  tendenc; ,  the  standard 
deviation  is  the  best  measure  of  dispersion,  and  the  method  of  least  squares  is 
the  best  method  of  fitting  a  regression  equation.  Other  assumptions  lead  to 
different  choices.  Most  practitioners  have  tended  to  make  the  assumption  of 
normality  and  not  to  worry  about  the  consequences  when  it  is  not  justified.  An¬ 
other  problem  arises  when  the  data  are  :om  aminated  by  spurious  observations 
(outliers)  which  come  from  distributions  with  different  meins  and/or  larger 
standard  deviations.  Many  methods  have  been  proposed  for  rejecting  outliers  or 
modifying  them  (or  their  weights) .  After  summarizing  (chronologically)  the 

voluminous  literature  on  measures  of  central  tendency  and  dispersion,  the  method 
of  least  squares  and  numerous  alternatives,  the  treatment  of  outliers,  and  robust 
estimation, the  author  recommends  a  simple  and  reasonably  robust  set  of  procedures. 
The  reader  idio  seeks  a  more  sopiiisticated  solution  can  choose  one  from  among  the 
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many  given  in  the  literature  cited  or  devise  one  to  fit  the  special  conditions 
of  his  problem. 
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1.  imOWCTION 

Since  very  early  tiroes,  people  have  been  interested  in  the  problem  of 
choosing  the  best  single  value  (average  or  mean)  to  summarize  the  infor¬ 
mation  given  by  a  nunber  of  independent  observations  or  measurements,  each 
siiiject  to  error,  of  the  same  quantity.  Eisenhart  (1972)  presents  evidence 
of  the  use  of  such  averages  as  the  mode  (the  value  occurring  most  frequenly) 
and  the  midrange  (the  value  midday  between  the  largest  and  smallest  obser¬ 
vations)  by  the  ancient  Greeks  and  Egyptians  and  by  the  Arabs  during  the 
Middle  Ages.  Ihe  median  (the  value  such  that  there  are  the  same  nunber 
of  observations  above  as  below  it)  and  the  arithmetic  mean  (the  sun  of  all 
the  observations  divided  by  their  nunber)  seem  not  to  have  come  into  use 
until  early  in  the  modem  era. 

The  problem  of  determining  the  constants  in  the  equation  of  the 
straight  line  which  best  fits  (in  some  specified  sense)  three  or  more  non- 
collinear  points  in  the  (x,y)  plane  whose  coordinates  are  pairs  of  associated 
values  of  two  related  variables,  x  and  y,  dates  back  at  leasv.  as  far  as 
Galileo  Galilei  (1632).  This  problem  can  be  generalized  in  two  ways:  (1) 
Instead  of  finding  the  best  linear  equation  in  two  variables  (best  line  in 
a  plane) ,  one  may  wish  to  find  the  best  linear  equation  in  three  variables 
(best  plane  in  three-dimensional  space)  or  in  more  than  three  variables 
(best  hyperplane  in  a  hyperspace) ; (2)  One  may  drop  the  requirement  that  the 
equation  be  linear,  aid  find  the  best  curve  in  a  plane, the  best  surface  in 
three-dimensional  space,  or  the  best  hvpersurface  in  a  hyperspace.  Statisti¬ 
cians  spiak  of  these  problems  as  those  of  linear  and  nonlinear  regression. 
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The  problems  of  determining  the  best  average  or  measure  of  central 
tendency  and  the  best  linear  or  nonlinear  regression  equation  are  related 
to  each  other  and  to  the  problem  of  choosing  the  best  measure  of  variability 
or  dispersion.  The  solutions  of  all  three  problems  depend  upon  the  distri¬ 
bution  of  the  errors  or  residuals  (deviations  of  the  observed  values  from 
those  predicted  by  the  regression  equation) .  The  body  of  statist!  cal  theory 
which  treats  all  these  related  problems  is  colled  the  theory  of  errors.  In 
the  following  sections  we  shall  trace  the  development  of  the  theory  of 
errors  from  the  time  of  Galileo  to  the  present  day.  We  shall  see  that, 
almost  from  the  time  early  in  the  nineteenth  century  when  it  was  first  pro¬ 
posed,  the  method  of  least  squares  has  enjoyed  a  pre-eminence  over  other 
methods  in  the  theory  of  errors.  We  shall  examine  the  question  as  to  the 
conditions  under  which  this  pre-eminence  is  deserved  and  when  other  methods 
are  theoretically  superior  to  the  method  of  least  squares. 

2.  PRE-LEAS! -SQUARES  ERA  (1632-1804) 

Galileo  Galilei  (1632)  considers  the  question  of  determining  the  distance 
from  the  earth  of  a  new  star,  given  observations  on  it*  maximum  and  minimum 
elevation  (in  degrees)  and  the  elevation  of  the  pole  stir  by  thirteen  ob¬ 
servers  ut  different  points  on  the  earth's  surface.  I'*  the  observations  were 
exact,  the  distance  could  be  determined  from  the  observations  of  any  two 
observers,  and  the  78  determinations  made  by  pairing  the  observers  in  all 
possible  ways  would  all  give  the  same  result.  Since  the  observations  are 
subject  to  error,  78  different  distances  of  the  star  from  the  center  of  the 
earth  are  found,  ranging  from  a  value  less  than  the  radius  of  the  jarth  to 
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infinity  and  beyond.  Both  extremes  are  manifestly  impossible  .  Galileo 
states  (p,  290  of  the  English  translation)  t  "Then  these  observers  being 
capable,  and  having  erred  for  all  that,  and  their  errors  needing  to  be 
corrected  for  us  to  get  the  best  possible  information  from  their  observa¬ 
tions,  it  will  be  appropriate  for  us  to  apply  the  uiinimun  amendments  and 
smallest  corrections  that  we  can-- just  enough  to  remove  the  observations 
from  impossibility  and  restore  them  to  possibility  *•*,"  In  thi3  statement 
we  see  the  beginnings  of  the  di  ory  of  errors ,  which  attempts  to  determine 
the  truth  from  inconsistent  observations  by  minimizing  various  non-decreasing 
functions  of  the  errors. 

Roger  Cotes  (1 722) ,  in  his  last  paragraph  (page  22) ,  considers  four 
observations  p,q,r  and  s,  which  may  not  be  equally  reliable,  of  the  position 
of  a  point.  He  proposes,  as  the  most  probable  true  position,  a  weighted 
average  with  weights  P,Q,  R  and  S  which  are  inversely  proportional  to  the 
spread  of  the  errors  to  which  the  respective  observations  are  subject.  This 
proposal  represents  one  of  the  earliest  attempts  to  determine  an  average 
which  uses  all  the  observations  but  does  not  assign  equal  weights  to  all 
of  them. 

Leonhard  Euler  (1749)  and  Johann  Tobias  Mayer  (17S0) ,  working  inde¬ 
pendently,  developed  what  has  come  to  be  blown  as  the  Method  of  Averages 
for  fitting  a  linear  equation  to  observed  data.  In  this  method  the 
observational  equations  are  divided  into  as  many  subsets  as  there  are 
coefficients  to  be  determined,  the  division  being  made  according  to  the 
vuiucs  of  (one  of)  the  independent  variable (s) ,  Those  having  the  largest 
values  of  this  variable  being  groined  together,  then  the  next  largest  in 
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another  grotp,  etc.  Then  the  equations  in  each  group  are  added  together, 
which  is  equivalent  to  applying  to  each  subset  the  condition  of  zero  sub 
of  residuals  inherent  in  the  method  of  Cotes  for  equal  uncertainties  of  the 
observations.  The  resulting  equations,  whose  nunber  is  equal  to  the  number 
of  coefficients  to  be  determined,  are  then  solved  simultaneously.  Mayer 
gives  a  numerical  example  in  which  he  uses  twenty-seven  observations  on  the 
position  of  a  moor,  spot  to  write  twenty-seven  equations  each  containing  three 
unknown  quantities  (the  coefficients  in  the  equation  to  be  fitter ,  which  he 
divides  inco  three  groups  of  nine  equations  each.  Then  he  adds  all  the 
equations  in  each  group  and  solves  the  resulting  three  equations  simultane¬ 
ously  to  obtain  the  three  unknown  coefficients.  A  drawback  of  this  method  is 
that  the  results  depend  on  the  way  in  which  the  observational  equations  are 
divided  into  subsets,  and  are  therefo.-e  somewhat  arbitrary  and  subjective. 
Euler  (articles  122-123  of  the  cited  work,)  is  also  credited  with  being  the 
first  to  use  the  minimax  principle  (minimization  of  the  maximum  residual 
error)  for  solving  a  redundant  system  of  linear  equations. 

Christopher  Maire  5  Roger  Joseph  Boscovich  (1755)  report  on  the  results 
of  an  expedition  undertaken  by  the  two  authors  under  the  auspices  of  Pope 
Benedict  XIV  to  measure  two  degrees  of  meridian  and  correct  the  map  of  the 
Papal  State.  pp.  499-501  the  author  (Boscovich)  attempts  to  determine 
the  best,  value  of  the  ellipticity  of  the  earth  from  five  measurements  of 
degrees  of  meridi  n  (the  new  one  by  Maire  and  himself  reported  earlier  in 
the  vc 1  une  and  four  others)  which  he  considers  most  reliable  among  a  large 
number  of  available  measurements .  If  the  earth  were  exactly  an  ellipsoid 
of  revolution  and  if  the  measurements  were  perfectly  accurate,  any  twn 
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measurements  of  degrees  of  pridian  cade  at  different  latitudes  would 
determine  its  ellipticity  exactly.  But  because  the  measurements  are  subject 
to  error,  each  of  the  10  pairs  of  measurements  yields  a  different  value  of 
the  ellipticity,  which  is  inversely  proportional  to  the  excess  of  the  polar 
degree  over  the  equatorial .  If  the  ellipticity  is  computed  from  the  arith¬ 
metic  mean  of  all  ten  excesses,  the  result  is  1/255,  but  if  the  two  most 
discrepant  values  of  the  excess  (one:  of  which  is  actually  negative)  are  dis¬ 
carded  mid  the  ellipticity  is  computed  from  the  arithmetic  mean  of  the  eight 
remaining  ones,  the  result  is  1/195.  Boscovich  gives  both  of  these  results, 
but  is  not  satisfied  with  either. 

Thomas  Simpson  (1756)  points  out  that  the  practice  of  taking  the  mean 
of  a  number  of  observations,  while  comnrm  anong  astronomers,  has  been  ques¬ 
tioned  by  sane  persons  of  considerable  note  who  have  maintained  that  a  single 
observation  ,  taker,  with  due  care,  is  as  reliable  as  the  mean  of  a  great 
number.  In  order  to  refute  that  position,  ho  determines  the  distributions 
of  the  mean  errors  of  n  independent  observations  from  a  discrete  uniform 
(rectangular)  distribution  and  from  a  discrete  isosceles  triangular  popula¬ 
tion.  He  then  compares  these  distributions  with  those  of  single  observations 
from  the  same  populations ,  and  shows  that  the  probability  is  less  that  the 
error  of  the  mean  of  n  observations  equals  or  exceeds  a  given  value  than 
that  the  error  of  a  single  observation  equals  or  exceeds  the  same  value,  the 
more  so  the  greater  the  value  of  n. 

Boscovich  .‘1 757)  summarizes  the  measurement  of  a  meridian  arc  near  Rome 
and  reevaluates  tie  data  on  this  and  previous  measurements  given  by  Mai  re  aid 
Boscovich  (1755),  He  proposes  for  the  first  time  two  criteria  for  determining 


5 


the  best- fitting  straight  line  y«a*bx  through  three  or  were  points:  (i)Tbe  sias 
of  the  positive  and  negative  residuals  (in  the  y-directi.cn)  shall  be  ntaeri- 
cally  equal;  and  (2)  the  sue  of  the  absolute  values  of  the  residuals  shall 
be  a  ndniam.  Kis  first  criterion  requires  that  the  best- fitting  straight 
liae  pass  through  the  centroid  (x,y)  of  the  observation ,  whose  coordinates 
are  the  arithmetic  scans  of  the  x's  and  of  the  y’s,  respectively.  The  second 
criterion  is  then  applied  subject  to  the  restriction  imposed  by  the  first. 

He  proceeds  to  apply  these  criteria  to  the  data  of  Maire  and  Boscovich,  but 
gives  no  indication  of  the  method  of  solving  the  resulting  equation  for  the 
best  value  of  the  slope  b. 

Simpson  (1757)  repeats  the  aattrial  of  his  earlier  paper,  with  two  not¬ 
able  additions.  At  the  beginning  he  states  explicitly  for  the  first  tiae  the 
assumptions  that  tise  error  distribution  is  (1)  sywaetric  (positive  and 
negative  errors  of  the  sane  negnitude  are  equally  lixely)  and  (2)  linited  ir. 
extent  (with  limits  depending  on  the  goodness  of  the  instrument  and  the  skill 
of  the  observer) .  Four  pages  of  new  material  at  the  end  are  devoted  to  extension 
to  a  continuous  isosceles  triangular  error  distribution  of  the  results  previously 
given  for  the  corresponding  discrete  distribution. 


Boscovich  (1760)  gives  (pp.  420-425 }  a  geometric  method  of  solving  the 
equations  resulting  from  the  criteria  stated  in  his  earlier  paper  to  the 
problem  of  finding  the  straight  line  y*a+bx  of  best  fit  to  a  nusber  of  points 
Jhich  are  net  cc'Mir.ear,  and  applies  this  method  to  t ho  same  five  meridian 
arcs,  obtaining  the  value  1/243  fer  the  elliptic! tv  of  the  earth.  This  method 
is  based  on  the  ordered  slopes  of  the  lines  connecting  the  five 


observational  points 'x^ ,v. j ,  i=l  ,7, 


5  tc  their  certroid  'x,vj. 
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According  to  Sheynin  (1966} ,  three  toils  of  Johan  Heinrich  Lambert 
(1768,  1765a,  b),  none  of  Wxich  the  present  author  has  seen,  contain:  (1) 
the  first  general  outline  since  Galilei  (1632)  of  the  properties  of  errors 
of  observations;  and  (2)  a  rale  for  es tinsting  the  precision  of  measurements 


by  capering  the 


taken  with  and  without  the  east  extreme  observation. 


In  tiie  first  work  {1760} , 


uses  the  principle  of 


likelihood. 


for  idiich  he  gives  a  graphical  aethod  of  solution;  Sheynin  notes,  however, 
that  Lmfoert  did  not  regard  this  principle  as  useful  i sl  practice,  and  never 
returned  to  it.  In  the  second  work  (1765a) ,  Lambert  stages  chat  the  objecti^s 
of  the  theory  of  errors  are  to  find  the  relations  between  errors,  their  con¬ 
sequences,  tiie  conditions  of  observation  end  the  accu.  jcy  of  instraaents.  He 
also  isdertakes  a  study  of  the  errors  of  functions  of  the  observations,  and 
endeavors  to  determine  the  ’*tne  value'*  of  the  observed  quantity  and  to  esti¬ 
mate  the  accuracy  of  the  observations.  He  gives  rules  for  fitting  straight 
lines  and  curves  by  dividing  the  observations  into  groups  and  taking  their 
centers  of  gravity  instead  of  the  original  observations,  in  the  third  work 
(1765b) ,  Lambert  gives  a  justi  5 cation  for  preferring  an  arithmetic  mean  to 
a  single  observation,  a  derivation  of  a  semicircular  probability  density 
function  for  tne  distribution  of  errors,  and  a  statement  cf  the  minimax 
principle  (minimizing  the  maximum  residual  error),  but  confesses  that  he  does 
not  know  how  to  use  this  principle  in  a  general  and  straightforward  manner. 

After  solving  several,  problems  concemiiig  averages  of  observations 
having  discrete  error  distributions,  which  are  reminiscent  of  Simpson  (1756, 
1757),  Joseph  Louis  Lagrange  (1774)  states  and  solves  his  Problea  X:  "One 
supposes  that  each  observation  is  subject  to  all  possible  errors  between  the 
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two  limits  p  and  -q,  asd  that  the  fadlity  of  each  error  x,  that  is,  the 
uter  of  cases  in  tdrich  it  can  occur,  divided  by  the  total  mater  of  cases, 
is  represented  by  any  function  «haterer  of  x  designated  by  y;  one  requires 
He  probability  tibet  the  nean  error  of  n  observations  shall  be  included  be¬ 
tween  the  Units  r  and  -s.rHe  applies  tire  result  to  two  examples:  (l)y»K 
(a  ocas  tact)  [uniform  or  rectangular  distribution  of  error] ;  (2)y*I{pZ-x^) , 

(-p ,p)  (parabolic  distribution  of  error].  Be  retries  (p.  229)  that  the  latter 
appears  to  be  "the  simplest  and  Most  ,<atural  which  one  can  imagine.”  He  also 
considers  a  Problem  XI,  which  is  essentially  a  third  example  of  Problem  X  with 
v*I  oos  x,  (-»/2,*/2)  [cosine  distribution  of  error].  In  each  case  the  mean 
error  of  n  observations  has  smaller  dispersion  (the  more  so  the  larger  n)  than 
the  error  of  a  single  observation. 

Pierre  Simon  Laplace  (1774)  considers  the  problem  of  determining  the  best 
average  of  three  observations.  He  proposes  two  criteria:  (l)The  average  should 
be  such  that  it  is  equally  likely  to  fall  abovE  or  below  the  true  value;  aad 
(2)  the  average  should  be  such  that  the  sua  of  the  products  of  the  errors 
and  their  respective  probabilities  is  a  ziniam.  He  demonstrates  that  the 
two  criteria  lead  to  the  same  average.  Let  x^x^x^  be  the  three  ooservatiens , 
and  let  p~x,-Xj  and  q*x^-x- .  Suppose  that  the  xrue  value  is  Xj+x;  then  the 
probability  (density]  that  the  three  observations  (assured  to  have  cone  from 
a  symmetric  distribution)  will  fall  at  the  points  x^x^,  and  x^  will  be  f(x) 
f(p-x)f(p+q-x) ,  where  f(x)  is  the  probability  [tensity]  that  a  single  observa¬ 
tion  will  fall  at  a  distance  x  fren  the  true  value.  Now  construct  a  curve 
whose  equation  is  y*f (x)f(p-*)f (p*q-x) .  In  order  to  satisfy  Laplace's  criteria 
it  is  necessary  to  find  the  value  of  x  such,  that  an  ordinate  erected  at  the 
abscissa  x(ne3$ured  froci  x^)  '■•’ides  the  area  under  this  curve  equally. 
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The  solution  depends,  of  course,  on  f(x).  Laplace  takes  f(x)*(m/2)e  *  x 
[the  density  function  of  what  we  cow  call  Laplace’s  first  distribution]  sod 
finds  the  solution  x-p*  (l/m)b|l*  (l/SJe  ^-  (l/5)e**J ,  which  reproaches  the 
axi  thee  tic  mean  (2p*q)/3  as  »*0  and  the  aedian  as  »*•;  for  (><*<-,  it  lies 
between  the  arithmetic  aean  and  the  median 

Qcaiel  Bernoulli  (1778)  qt^sticns  the  practice  ccaenn  to  astronomers  \ 

I 

of  rejecting  completely  observations  judged  to  be  too  wide  of  the  truth,  but  ? 

I 

assigning  equal  weights  to  all  these  retained.  He  advocates  rejection  of  * 

observations  only  if  an  accident  occurred  which  rendered  an  observation  open  < 

i 

to  question.  He  proposes  a  semicircular  distribution  of  error,  and  discusses 
the  choice  of  diameter.  As  limiting  cases,  the  choice  of  an  infinite  diameter 
leads  to  takinq  the  arithmetic  mean  as  the  average  of  the  observations ,  while 
diminishing  doe  diareter  as  much  «s  possible  withe  vt  contradict  con  leads  tc 
taking  the  midrange.  He  p reposes  what  has  cor?  to  be  known  as  the  method 
'if  maximum  likelihood  to  determine  the  average  of  ?-  timber  of  observations. 

For  two  observation*,  the  result  is  equal  to  the  arithmetic  For  three 

observations,  x,<r,at  Jve  result  is  greater  than, equal  to,  or  le  than 
the  arithmetic  mean  (x^+  Xj)/3  according  as  the  aedian  is  less  than, 

eqwl  to,  or  greater  than  the  midrange  (x,*  x,)/2.  For  more  than  three 
observations,  die  method  becomes  unwieldy,  since  for  n  observations  it  requires 
solution  of  an  equation  of  degree  (2n-l) .  In  consenting  on  Bernoulli’s  paper, 

Euler  (1778)  proposes  maximizing  the  si®  of  the  fourth  powers  of  the  prob¬ 
ability  densities  of  the  errors  of  the  observations  instead  of  maximizing 
their  product  (the  likelihood  function) .  He  advances  certain  in  convincing 
arguments-  for  the  use  of  his  criterion  nstead  of  Bernoulli’s,  and  works 
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wit  two  entries  based  on  real  observations.  The  really  vulnerable  part  of 
Bersaulli-s  method,  as  Isaac  Todseter  (1865)  has  pointed  ait,  is  not  the 
principle  of  likelihood,  bat  the  particular  1m  of  probfoility  assuvwL 

Laplace  (1781)  extends  the  theory  given  in  his  earlier  p^»er  to  aiy  nuaber 
of  observations  and  generalizes  it  to  the  case  in  which  each  observation  azy 
hate  a  different  lav  of  facility  of  error.  :fe  states  that  aae  can  mate  in¬ 
finitely  cony  choices  of  an  average  according  as  one  ixpeses  various  criteria, 
of  which  he  emanates  four:  (1)  Ctoe  nar/  require  that  average  such  that  the 
sui  of  the  positive  errors  equal  the  sue  of  the  negative  errors  Ms  arithmetic 
mean];  (2)  one  may  require  that  the  sua  of  the  positive  errors  aultiplied  • 
by  their  respective  probabilities  eq  al  the  sue  of  the  negative  errors  aulti  - 
plied  by  their  Respective  probabilities;  (3)  one  nay  requ.ne  that  the  average 
be  the  most  probable  true  value  [Dmiel  Bernoulli's  maxiam  likelihood  a  - ter- 
ion] :  or  (4)  one  aay  require  that  the  error  be  a  minima,  i.e.  that  foe  sun 
of  foe  products  of  foe  errors  (taken  without  regard  to  sign)  and  their  respec¬ 
tive  probabilid.es  be  a  mini  mas.  He  shows  that  criterion  (4) ,  which  he  regards 
as  the  findaBental  one,  is  equivalent  to  criterion  (2).  He  also  shows  that 
criterion  (4^  leads  to  the  arithmetic  mean,  and  hence  agrees  with  criterion 
(i),  when  foe  following  conditions  are  satisfied:  (1)  Ihe  law  of  facility 
of  error  is  the  sane  for  ail  the  observations ;  (2)  positive  end  negative 
errors  of  the  same  magnitude  are  equally  probable;  and  (3)  errors  can  be 
infinite,  but  the  probability  cf  an  error  x  tends  to  zero  as  |x;-»-. 

Jean  Bernoulli  III  (1785),  in  as-  article  on  averages,  refers  to  the 
ae  foods  of  Boscovich  (1757,1760)  and  Lambert  (1765a),  and  gives  fuller  accounts 
of  the  aenoirs  of  Lagrange  (17741  and  Daniel  Bernoulli  (1778).  the  latte. 
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differing  soaeidwt  fro*  the  ptilished  version.  The  discrepancy  is  apparently 
accounted  for  \sj  the  fact  thrt  the  svaamy  given  is  trcsed  on  a  preliminary 
1769  version  in  lAich  a  seaddrcular  distribution  of  error  is  assuaed  s«j  in 
the  version  pdilished  in  1778,  but  the  method  of  mariw  likelihood  is  not 
employed.  Instead,  the  following  iterative  procedure  is  used:  First  take  the 
Bean  of  all  the  observations  as  the  center  of  the  scad  circle  and  determine  the 
center  of  gravity  of  the  area  uirrespondir-^r  to  the  rtrervatiens ;  take  this 
point  as  the  center  of  a  new  sead circle,  and  repeat  the  operation  uitil  the 
center  of  g'svity  and  tas  center  if  the  sead  circle  coincide. 

Laplace  (.'786} ,  given  three  or  acre  ncn-collirear  pairs  of  observations 
of  t»o  Yariar-les,  x  and  y,  proposes  testing  the  adequacy  of  the  linear  relation 
ya+Lx  by  first  determining  a  and  b  so  as  to  aininize  the  maxima:.  Aso.ate 
deviation  fron  the  fitted  straight  line,  then  deciding  subjectively  whether 
a  dsri.atioa  of  this  magnitude  is  consistent  with  the  limits  of  t’oe  errors  to 
which  the  observations  are  susceptible.  He  gives  a  pi^-edure  for  determining 
the  required  values  of  a  and  b.  In  a  later  paper,  Laplace  (1791)  gives  a 
precedes  diich  he  says  is  much  simpler.  He  observes  that  when  the  absolute 
value  of  the  largest  deviation  is  made  a  minimum,  there  are  actually  three 
observations  whose  devicCi.ons,two  with  one  sign  ani  one  with  the  other,  have 
this  sarc  absolute  value.  He  offers  another  meth^  of  treating  t:*  observations, 
based  on  the  criteria  that  (1)  the  siaa  of  the  deviation  should  be  zero  Aid 
(2)  the  sun  o'.  the  absolute  deviations  should  be  a  soniaur*.  These  criteria 
were  first  proposed  by  Boscovich  (1757) .  Laplace  develops  an  analytic  proce¬ 
dure  L&sed  on  these  criteria  while  the  procedure  used  by  Boscovich  (1760) 
was  geometric.  Laplace  appli-s  both  his  methods  to  data  on  lengths  of  degrees 
of  laerieian  and  on  I*  igths  of  the  seconds  pendulum,  both  of  wb'.ch  he  -ises 
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to  de teniae  the  earth's  ellipticity.  In  the  second  volume  of  his  two-volume 
treatise  on  celestial  Mechanics,  Laplace  (1799)  sumnrizes  the  results  of  his 
earlier  papers,  again  proposing  the  same  two  Methods  for  determining  the 
straight  line  y «a*bx  which  best  fits  three  or  more  points  whose 

coordinates  are  pairs  of  related  observations:  (1)  Minimizing  the  mmmm 
residual;  and  (2)  minimizing  the  sum  of  the  absolute  residuals  subject  to  the 
restriction  that  the  sins  of  the  positive  and  negative  residuals  shall  be 
numerically  equal  . 

Gaspard  Clair  Francois  Marie  Riche  Prony  (1804)  gives  a  geometric  inter¬ 
pretation  of  the  two  methods  of  Laplace  (1799) ,  applies  them  to  actual  data, 
and  compares  the  results  with  those  obtained  by  a  third  method  (his  own)  based 
on  the  idea  that  the  deviation  to  be  expected  should  be  proportional  to  the 
independent  variable  x,  or  almost  so. 

Jean  Treobley  (1804),  after  brief  mention  of  the  work  of  Lambert,  Laplace, 
and  Daniel  Bernoulli  on  the  most  advantageous  method  of  taking  averages  of 
observations,  turns  to  the  wort  of  Lagrange  (1774)  on  the  sane  problem.  He 
states  that  his  purpose  is  to  use  combinatorial  theory  to  obtain  the  same 
results  which  Lagrange  obtained  by  the  use  of  integral  calculus.  He  succeeds 
in  using  combinatorial  theory  to  obtain  results  for  discrete  error  distribu¬ 
tions  which  Lagrange  5r»uid  with  the  aid  of  differential  calculus  and  Simpson 
(1756,  1757)  by  series  expansions.  He  does  not  treat  the  case  of  continuous 
error  distributions ,  which  is  the  only  one  for  which  Lagrange  employed 
integral  calculus. 

3.  EIGHTY  YEARS  OF  LEAS1  SQUARES  (1805-1384) 

Adntr.  Marie  Legendre  (1805),  while  not  the  first  to  use  the  method  of 
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least  squares,  was  the  ilrst  to  publish  it.  He  starts  with  the  linear  font 
E  «  a*fcx+cy+  ",  where  a,b,c,  ***  are  know  coefficients  thich  wary  from 
one  equation  to  another  and  x,y  are  unknowns  which  aust  be  detenued 
by  the  condition  that  the  value  of  E  reduces ,  for  each  equation,  to  zero  or  a 
very  snail  maber.  He  derives  the  nomal  equations  without  the  explicit  use 
of  calculus  by  Multiplying  the  linear  font  in  the  unknowns  by  the  coefficient 
of  each  of  the  unknowns  and  siaadng  over  all  the  observations,  then  setting 
the  sims  equal  to  zero.  If  the  results,  when  substituted  in  the  noraal 
equations,  produce  coe  or  note  errors  judged  too  large  to  be  admissible,  he 
recowends  rejecting  the  equations  which  produced  that,  and  determining  the 
tnka»ns  from  the  regaining  equations.  Though  he  offers  no  mathematical  proof 
of  the  method  of  least  squares,  Legendre  makes  the  following  clai*  for  its 
si^jericrity:  "Of  all  the  principles  which  one  can  propose  for  this  object,  I 
think  that  none  is  bo  re  general,  wore  exact,  or  easier  to  apply  than  the  one 
which  we  have  used  in  the  preceding  research,  which  consists  in  Baking  the 
sum  cf  the  squares  of  the  errors  a  siniaim.  By  this  scans,  a  sort  of  equili¬ 
brium  a»ong  the  errors  is  established  which,  preventing  the  extremes  from 
prevailing,  is  r:ost  proper  to  make  known  the  state  of  the  system  nearest  to 
the  truth."  [Translation  by  present  writei  of  statements  on  pp.  72-75]. 

Puissant  (1805)  gives  a  theoretical  discussion  of  the  method  of  least 
squares,  followed  by  an  application  to  the  determination  of  the  ellipticity 
cf  the  earth  frca  measures  of  degrees  of  meridian.  Ke  mentions  the  method 
of  conditional  equations  [method  of  averages  ]  proposed  by  Mayer  and  C,  -  [Bosco- 
vich]  method  (preferred,  he  says,  by  Delambre}  which  gives  "tne  least  errors 
of  latitude,  half  positive,  half  negative."  He  also  applies  the  method  of 
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least  squares  to  the  determination  of  the  elliptic! ty  of  the  t?rth  fro*  the 
lengths  of  seconds  pendulums ,  and  co^ares  the  results  with  those  obtained  by 
Ffcthieu  by  minimizing  the  an daw*  discrepancy  between  observed  and  fitted 
values,  as  proposed  by  Laplace  (1799). 

Svaribrrg  (1305),  in  the  preliminary  discourse  of  a  book  describing  the 
aeasureoer  t  of  a  aeridian  arc  in  Lapland  by  Svanberg  and  three  colleagues, 
compares  the  results  obtained  by  applying  die  two  aethods  proposed  by  Laplace 
(1799)  to  die  determination  of  the  earth's  ellipticity  froa  fifteen  measure- 
amts  of  the  lengths  of  seconds  pssdulues  and  of  degrees  of  aeridian  by  various 
observers  at  different  latitudes.  No  mention  is  made  of  die  method  of  least 
squares:  it  is  reasonable  to  assute  that,  at  the  time  of  writing,  the  author 
had  not  heard  of  it.  Tho  sane  assimption  is  probably  valid  in  the  case  of 
von  Zach  (3805),  who  expresses  Ae  opinion  that  little  reliance  can  be  placed 
on  the  arithmetic  mean  when  it  dees  not  stand  eqi ally  fir  from  the  extremes. 

He  reviews  the  work  of  Laabert  (1765a)  and  Daniel  Bernoulli  (1778) ,  but 
expresses  a  preference  for  the  Modification  of  Bernoulli's  procedure  due  to 
Euler  (1778) ,  wiidi  he  applies  to  data  on  terrestrial  refraction  and  barometric 
pressure. 

Jean  Baptiste  Delanbre  (1806-10)  gives  a  three -volume  report  on  a 

vast  aider taxing,  carried  out  under  the  auspices  of  the  Academia  des  Sciences 
with  d»2  support  of  the  French  government,  to  establish  die  base  of  tne  metric 
system  (1  meter  *  one  ten-millionth  of  the  distance  from  the  Equator  to  the 
North  Pole)  by  measuring  the  meridian  arc  l j tween  the  parallels  of  Dunkerque 
end  Barcelona  (over  9°) .  On  page  117  of  the  first  volume ,  Delambre  places 
himself  squarely  cn  the  side  of  those  who  never  suppress  an  observation  or 
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assign  it  a  smaller  weight  sinply  because  it  deviates  from  other  observa¬ 
tions  of  the  sane  kind.  Cfc  pages  92  and  110  of  the  third  volume,  he  coop  ares 
values  of  the  earth's  eccentricity  (ellipticity)  calculated  from  the 
observations  of  Delambre  and  Mediain  by  Laplace  (1799) ,  by  Legendre  (1805) , 
and  by  himself.  Laplace  [by  minimizing  the  maximun  deviation]  obtained  the 
value  1/150,  Legendre  [by  the  method  of  least  squares],  1/148;  and  Delamb  re 
[by  an  unspecified  method,  probably  that  of  Del  arch  re  (1813) J,  1/133.  How¬ 
ever,  by  combining  the  observations  of  Delarabre  and  Mechain  with  those  made 
by  Bouguer  in  Peru  about  60  years  earlier,  the  task  force  obtained  the  value 

1/334,  whic.i  agrees  much  better  with  results  cbtained  from  measurements  of 

* 

the  length  qf  a  pendulum  of  known  period  and  with  those  predicted  by  the 

t 

theory  of  notation  and  precession  This  latter  value  was  used  in  deteimin- 

r 

ing  the  length  of  the  standard  meter. 

Carl  Friedrich  Gauss  (1806)  claims  priority  in  the  use  (though  not  in 
the  publication)  of  the  method  of  least  squares  in  the  following  words 
(p.  184):  "I  still  have  not  seen  Legendre's  [(1805)]  work.  I  have  purposely 
not  taken  tte  trouble  to  do  so,  in  order  that  the  work  on  my  method  shall 
remain  entirely  my  own  ideas .  Through  a  few  words ,  method  cf  least  squares , 
which  de  Lalande  let  fall  in  the  last  History  of  Astronomy,  1805,  1  arrive 
at  the  stppcsition  that  a  fundamental  theorem,  which  I  myself  have  already 
used  for  twelve  years  in  many  calculations , and  which  I  will  also  use  in  my 
work  [Gauss  (1309)],  whether  or  not  it  belongs  essentially  to  my  method-- 
that  this  fundamental  theorem  is  also  employed  by  Legendre.”  [Translation  of 
portion  quoted  by  Merrimau  (1877),  pp.  162-163]. 

Bernhard  August  von  Lindennu  (1806)  spates  Laplace's  (1/99)  analytic 
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fora,  of  the  method  of  Boscovich  (1760),  as  well  as  Legendre’s  (1805)  method 
of  least  squares ,  and  applies  both  in  the  determination  of  the  elliptic 
meridian.  He  does  not  convent  as  to  the  relative  merits  of  the  two  methods, 
but  reports  that,  in  at  lerst  one  instance,  they  yield  very  nearly  the  sarc? 
results. 

Robert  Adrain  (1803) ,  apparently  una..-tw  of  the  work  of  Legendre  (1805) 
and  of  the  (as  yet  unpublished)  work  of  Gauss,  independently  develops  the 
method  of  least  squares  and  uses  it  to  solve  the  following  problems:  (1) 

Suppose  a,b,c,d,  ***  to  be  the  observed  measures  of  any  quantity  x,  the  most 
probable  value  of  x  is  required  [Ans.  the  arithmetic  mean  of  the  observations); 
(2)  Given  the  observed  positions  of  a  point  in  sp£ce,  to  find  the  most 
probable  position  of  the  point  [Ans.  the  center  of  gravity  of  the  observed 
positions);  (3)  To  correct  the  dead  reckoning  at  sea,  by  an  observation  of 
the  latitude  [the  answer  differs  from  all  tries  previously  used,  which  he 
hopes  will  be  abandoned) ;  (4)  To  correct  a  survey.  The  author  mentions  that 
he  lias  also  used  the  same  principle  to  determine  the  most  probable  value  of 
the  earth's  ellipticity.  Ihese  last  results  were  not  published  until  ten 

years  later  [Adrain  (1818a)]. 

\ 

Gauss  (1809)  deduces  the  normal  (Gaussian)  law  of  error  from  the 
postulate  that  when  any  nunber  of  equally  good  direct  observations  cf  an 
unknown  quantity  x  are  giver.,  the  most  probable  value  is  their  arithmetic 
mean.  He  shows  that  the  method  of  least  squares,  used  by  him  since  1795, 
but  named  by  Legendre  (1805) ,  follows  as  a  consequence  of  the  Gaussian  law 
of  error.  If  one  does  not  assure  this  law,  he  might  minimize  the  sun  of  the 
2n^-  powers  of  the  errors  for  n*l,2,3,  '**  ,  but  Gauss  points  out  that 
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nnimiziiig  the  sun  of  their  squares  (n-1)  is  simplest.  Letting  n»*  [place's 
method  of  situation]  is  equivalent  to  minimizing  the  maximum  errors  (one 
positive  and  one  negative,  equal  in  magnitude).  Gauss  also  mentions  Laplace's 
other  principle,  first  proposed  by  Boscovich,  of  making  the  sun  of  the  abso¬ 
lute  values  of  the  deviations  a  minimun.  He  was  apparently  uiaware  that 
Boscovich  proposed  to  minimize  the  sun  of  the  absolute  values  of  the  deviations 
subject  to  the  restriction  that  the  suns  of  the  positive  and  negative  devia¬ 
tions  shall  be  equal,  since  he  speaks  of  this  restriction  as  one  added  by 
Laplace.  He  does  not  mention  the  fact,  though  he  may  have  been  aware  of  it, 
that  this  restriction  results  in  minimizing  the  sun  of  the  absolute  deviations 
from  the  arithmetic  mean  instead  of  from  the  median.  The  same  is  true  of 
the  fact  that  minimizing  the  sun  of  the  2n—  powers  for  n*»  results  in  the 
choice  of  tlie  midrange  as  en  average  Instead  of  the  arithmetic  mean. 

Laplace  (1810)  shows  that  if  random  samples  of  size  n  are  drawn  from 
a  distribution  with  mean  y  and  known  dispersion,  then  the  distribution  of 
sample  means  has  mean  p  and  (dispersion  l//n  times  that  of  the  parent 
distribution;  moreover,  under  very  general  conditions  [which  Laplace  does 
not  state  explicitly]  on  the  parent  distribution,  the  distribution  of  sample 
means  tends  to  normality  as  the  sample  size  n  increases.  As  Eisenhart  (1964) 
has  pointed  out,  these  results  greatly  strengthen  the  justification  given 
by  Gauss  (1809)  for  the  use  of  the  method  of  least  squares,  especially  when 
dealing  with  a  large  number  of  observations.  In  a  supplement,  Laplace  shows 
that  when  the  law  of  error  is  the  normal  law,  his  own  •'most  advantageous 
method"  [Laplace  (1781)],  the  method  of  maximum  likelihood[ Bernoulli  (3.778), 
Euler  (1778?)  and  Gauss  (1809)],  and  the  method  of  least  squares  [which  he 
introduces  without  reference  to  either  Legendre  (1805)  or  Gauss  (1809) J  are  all 
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equivalent  and  lead  to  the  choice  of  the  arithmetic  mean.  as  the  average  of 
a  maker  of  observations. 

Friedrich  Wilhelm  Bessel  (1810)  uses  the  Method  of  least  squares  to 
determine  the  orbit  of  a  comet  and  Gauss  (1811)  uses  it  to  determine  the 
orbit  of  the  asteroid  Pallas.  Gauss  obtains  twelve  equations  involving  six 
inknown  corrections  to  the  elements  of  the  orbit.  Because  the  nature  of  the 
observations  which  furnish  the  tenth  of  these  equations  does  not  inspire 
confidence ,  he  discards  that  equation  and  determines  the  unknowns  from  the 
ether  eleven.  Merriman  (1877),  p.  166,  notes:  "We  find  here  for  the  first 
time  the  notation  [a  b]»  a'b’  +  a,rb"+  a*  b**  ♦  “*  and  also  the  algorithm  for 
the  solution  of  normal  equations  by  successive  substitution,  since  univers¬ 
ally  followed  in  lengthy  computations  ***." 

Laplace  (1811a)  considers,  in  his  Articles  VI  and  VII,  the  problem  of 
choosing  the  average  to  take  of  n  observations  in  order  to  correct  an 
element  already  known  approximately.  He  finds  that  the  normal  (Gaussian) 
law  is  the  only  one  of  the  form  f(x)«  Ke"®^x  ^ ,  where  g(x2)  is  continuous, 
for  which  the  arithmetic  mean  is  the  "most  advantageous"  in  the  sense  of 
Laplace  (1781).  However,  because  of  tie  rudimentary  form  of  the  central  limit 
theorem  given  by  Laplace  (1810) ,  choice  of  the  arithmetic  mean  is  advantageous 
when  the  number  of  observations  is  large  or  when  one  is  taking  the  average  of 
results  eadi  based  on  a  large  number  of  observations,  and  hence  in  these 
cases  one  may  use  the  method  of  least  squares,  which  Gauss  (1!>09)  developed 
from  the  postulate  that  the  arithmetic  mean  is  the  best  average  of  a  number 
of  observations.  Jn  his  Article  VIII  [reprinted  as  Lapl>*^e  (1811b)],  Laplace 
extends  these  results  to  the  case  of  correcting  two  unknown  elements  [regression 
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coefficients].  His  mlysis  is  already  quite  laborious  for  this  case,  but 
he  indicates  that  tie  results  hold  far  any  nucber  of  aokaan  elements  datever. 

laplaoe  (1812),  in  his  monumental  work  on  the  analytic  theory  cf  proba¬ 
bilities,  staaarizes  the  results  of  his  study  spanning  alaost  four  decades. 
Articles  20-24  of  his  Book  II,  Chapter  iv,  which  is  entitled  "Of  tie  proba¬ 
bility  of  errors  of  tie  nee  results  of  a  large  number  of  observations,  and 
of  the  aost  advantageous  mem  results,”  contain  most  of  the  relevant  aaterial. 
Articles  20  and  21,  Wiich  deal  respectively  with  tie  correction  of  one  or  two 
elements,  already  known  approximately,  by  tie  aggregate  of  a  large  maker  of 
observations,  and  which  contain  Laplace' s' ‘preof  of  tie  method  of  least 
squares,  follow  closely  the  treatment  of  Laplace  fl811a,b).  Article  22,  which 
deals  with  the  case  in  which  the  facility  of  positive  errors  is  not  the  saae 
as  that  oi  negative  ones  [tie  distribution  of  errors  is  not  syaaetric]  follows 
Laplace  (1810).  Article  23,  unlike  tie  preceding  ones,  deals  with  tie  case 
in  which  the  observations  have  already  been  aade.  The  idea  of  the  ’test 
advantageous"  average  as  the  abscissa  corresponding  to  the  ordinate  which 
divides  equally  the  area  under  the  [joint]  probability  [density]  curve  [like¬ 
lihood  curve]  of  tie  observations  goes  back  to  two  of  Laplace's  earliest 
mnoirs  [Laplace  (1774,  1781)].  The  author  also  sumarizes  the  results  of 
tie  supplement  of  Laplace  (1810)  and  gives  a  more  straightforward  proof 

than  that  of  Laplace  (1811a)  of  the  fact  that  tie  normal  law  of  error  is 

- ff (x2' 

the  only  one  of  the  form  f(x)*Ke  *'■  *  for  which  the  arithmetic  mean  is 
most  advantageous.  In  Article  24,  the  author  mentions  various  other  methods 
of  averaging  observations,  including  the  one  proposed  by  Cotes  (1722)  and 
applied  by  Euler  (1749)  and  Mayer  (1750),  and  the  one  based  on  minimizing 


the  sum  cf  the  2*=  powers  of  the  deviations,  did  for  »<•  is  eydnkct  to 
«rim adziag  the  wriar  deviation,  as  proposed  fcr  Laplace  (17*6,1795).  Be 
concludes  that  the  best  choice  of  wethodi  depends  ac  the  law  of  error  wks. 
the  nuwfcer  of  ofcsenrxtioas  is  swell,  bet  reawafe  the  method  of  least  sqpgres 
proposed  by  Legendre  (1SSS)  and  Gauss  (1109)  for  use  daeaeoer  the  amber  of 
observations  is  large.  In  the  second  supplement  (first  pcblisaed  in  ISIS), 
Laplace  explains  the  wethod  proposed  by  Boscoricfe  (1757,  1763)  [see  also 
Laplace  (1795,  1799)]  based  cn  mxnmzing  the  sat  of  the  absolute  talats  of 
the  deriatioos  subject  to  the  restriction  that  their  algebraic  sum  be  zero, 
to  tfeich  be  gives  the  sane  'method  of  situation.**  lie  determines  a  condition 
under  which  this  wethod  is  preferable  to  his  ac  'iscst  adrsstsgeoos  wethod,” 
and  explores  the  possibility  of  finding  a  weighted  average  of  the  two  which 
is  wore  precise  than  either. 

Delaabre  (1813)  returns  to  the  question  [see  Delancre  (1806-10)]  of 
determining  the  eccentricity  of  the  earth  from  inconsistent  observations  os 
the  lengths  of  meridian  arcs.  On  page  608,  he  advocates  a  method,  which  is 
probably  the  one  he  used  in  his  earlier  vorfc,  in  the  fallowing  words.  'It 
seems  that  one  should  seek  neither  the  least  sue  of  errors  nor  the  least 
suw  of  squares,  but  the  least  errors ,  half  negative,  half  positive.”  Since 
the  least  sua  of  absolute  deviations  is  achieved  when  the  deviations  are 
taken  from  the  median,  in  which  case  half  the  deviations  are  negative  and 
half  positive,  it  appears  that  the  author,  perhaps  without  realizing  it,  is 
advocating  the  Boscovich- Laplace  eethod  without  the  restriction  that  the 
sms  of  the  positive  and  negative  deviation!  be  equal  in  sagnitude,  which 
requires  that  deviations  be  taken  from  the  arithmetic  oean  rather  than  fron 
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Euc>.  90.  682-46.9,  Belmbte  acriies  sis  aetlsai  to  the  irscniassfca 
easts's  ecoeetricitT  froa  tie  BH  arisre-fllccfcai  a  ocsemiiaB. 

(mi)  sets  tie  nr  nr  for  a  qacCatjae  Iron  jp-  72-75  of  bis 
Mriw  mi  [kfcaie  (1805)3  or  statiag  list  gapliarr  CU12?}  ias  fami  by 
«a«*Tin'wT  based  ok  tie  <r»irnfac  af  prfAiliitifS  tstt  tie  Method  of 
least  squires  should  be  nsed  in  preference  to  all  ethers  to  gad  tie  vast 
exact  si  1  agr  vibe  of  cae  or  of  severs!  ariana  rineart  an{  all  those 
wid  are  given  by  different  ofeserratiocs.  2a  sc  dedsg,  he  otexstaSss,  as 
wacsy  later  writers  do,  tie  generality  of  abet  fapiace  actually  proved  about 
tie  nasthod  of  least  soars,  imd>  is  tzar  tie  aetsnl  of  least  squares  is 
best  for  tie  oosaal  error  lar  and  asyptzticallT  best  (as  tie  raahrr  of 
observations  a«)  for  other  error  las  satisfying  certain  oaflirims. 

Jac  rredexik  ran  Beedc  Calicoes*  {1816}  discusses  tie  average  value  of 
a  certain  meter  of  quantities  or  of  separate  observer ices.  For  seeni 
observatiocs  of  a  single  quantity ,  be  advocates  tie  use  of  tie  aritanetic 
aean.  If  one  of  the  observations  differs  frtac  the  aean  by  an  anoeast  greater 
than  tie  asstraed  limit  of  error,  that  observation  is  discarded,  and  the 
arithmetic  mean  of  the  regaining  ones  is  takm.  For  observations  on  two 
related  quantities,  be  proposes  two  Methods  of  determining  the  bast  fitting 
straight  line.  The  first,  which  be  attributes  to  Lanbert  (1765a),  involves 
dividing  the  points  representing  the  pairs  of  observed  (x,y)  valies  into 
two  grotps  (as  nearly  possible  equal  in  mcber) ,  case  containing  the  points 
with  the  snailest  abscissas  and  the  other  those  with  the  largest  abscissas, 
and  joining  the  centers  of  gravity  of  tne  two  sets  of  points.  The  other 
nethod  is  based  on  the  use  of  the  Boscovich  criteria,  which  th*  author 


asariiasffies  to  Lagilagr  (179S).  3jr  tafrirg  x2  cr  *x  rataer  team  x  *s  tie  iade- 
pgadest  variable,  t&c  agaSnar  cfcaaiss  aavLiinear  recession  agaigLass  c£ 
tae  fiera*  *fz^  *ad  7*s*fu5F  as  well  as  the  ftmear  regression  eqEgtim  of 
tie  Sax  yir»£x.  Se  advocates  osiag  that  pave?  of  x  afeids  gyres  tie  best  St 
ia  tie  verse  that  tie  sac  of  tie  absolute  deaiztiaGS  of  tie  deserves  poirais 
fro*  tie  fitted  cne  is  sazllest,  ssbfect  to  tie  coraitfraa  that  tie  algebraic 
sac  is  seso.  It  is  ixaesestisg  to  acts  tier  ie  safes  ao  cf  tie  aetiai 

of  least  sraares,  altrxrgja  tie  wri  of  Legendre  (1S2S: ,  Gacjss  (LSP9) ,  ani 
laplane  fl*12)  was  already  widely  kao. 

Gauss  (ISIS)  points  ott  that  it  is  not  necessary  to  know  tie  precision 
if*  l/cvT,  %ebere  ©  is  tie  standard  seriatim]  cf  tie  ocsenstics  in  order  to 
apply  tie  method  cf  least  smzres,  zad  that  the  relation  cf  tie  precision  cf 
tie  results  to  that  of  tie  observations  is  independent  cf  a,  but  that  tie  value 
of  h  is  itself  interesting  sad  instructive.  He  ties  proceeds  to  giv*  various 
methods  of  detenciirirg  fc,  including  methods  based  on  tie  a—  root  of  tie  sai 
of  tie  rr^  powers  of  tie  absolute  errors  (deriaticcs  from  tie  true  value) 
for  a*l, 2,3,4,5.6,  cad  an  alternate  method  based  cn  tie  median  M  of  tie 
absolute  values  of  tie  errors.  He  snows  tiat  tie  nethod  based  cn  a* 2  gives 
tie  greatest  precision  for  sanples  free  a  normal  population,  1PJ  observations 
for  n*2  yielding  the  sane  precision  as  114  for  n*l,  109  for  n*3,  153  for  u*4, 
178  for  c*5,  251  for  n*6,  or  249  [actually  272— see  conments  oelov]  for  the 
alternate  nethod  based  on  M,  but  notes  that  the  last  method  and  the  one  based 
on  n*l  are  arithmetically  more  convenient.  Although  he  gave  the  correct 
mathematical  expression  for  the  probable  error  of  the  cedi an  absolute  error 
M,  Gauss  made  a  mistake  in  calculating  the  value  of  the  raaoerical  coefficient. 
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Severs!  Irttr  asriass.  irrrfrar?*?  i2*ier  £1533),  ?se±»  {1332-34},  aas  Jsysas 
(1365),  bare  fliers  *as  agrees  value,  a£  is  is  Isierasti^?  i§  sole  tfesr  tse 
SLrst  tMo,  writirg  carls?  tie  liisnae  of  (ss,  ddd  sc  aitssat  assrifsgsg 
has  adstace,  verddt  lanafrs  scasrecsed  is  sis  cssllsctsd  asaris. 
i  Asrsia  (ISS5*;  calculates  ere  earth's  ellistidt?  b?  the  acthad  d  5*ast 

f  --- 

j  squares  Sraa  iata  as  she  Isagtss  cf  sssiIijbs  vibratisg  skseeB  si  different 

I  latitudes  gives  fcy  upl2g  (17SS).  ^  csssarrs  the  resets  sot  ssly  idth 

|  those  obtained  by  Laplace,  based  sc  tie  criteria  ®f  Ssscsa-id*  (17®?) ,  bis 

also  wd.tr  tie  rrsslts  cetaiaa:  by  that  aethad  aiser  egrrsetizg  t»s  errors 
safe  br  Laplace.  He  Satis  that  scst  of  the  discressirf  between  Laplcoe's 
j  results  and  his  cam  is  due  to  those  errors.  The  corrected  results  of  applying 

the  3oscc?rich- Laplace  Method,  based  ce  irinrarizing  the  sun  of  the  essolote 
values  of  the  residuals  subject  to  the  restriction  that  the  algebraic  sis  c*f 
the  residuals  shall  be  zero,  differ  tr»*  less  than  IS  from  those  obtained  by 
the  Method  of  least  squares.  In  another  psper ,  Adrain  (ISlSb)  uses  the  method 
of  least  squares  to  find  the  diameter  of  the  sphere  (7518.7  idles)  which  most 
nearly  coincides  in  various  specified  peculiarities  with  the  actual  terrestrial 
spheriod,  given  neasureneats  of  degrees  of  meridian. 

In  1821  there  appeared  an  anonymous  paper  wdiose  authorship  Czuber  (18Sla, 
1899)  attributes  to  Svanberg.  The  author  gives  a  discussion,  which  is  as  saidi 
philosophical  as  satbesatical ,  of  the  problea  of  finding  the  best  average  of 
a  masher  of  observations.  He  distinguishes  between  two  cases,  one  in  which 
|  the  observations  are  all  made  to  the  sane  identical  object  and  rhus  differ 

|  only  because  of  errors  of  observation  and  the  other  in  which  observations  are 

|  sade  cn  a  quantity  which  is  itself  variable.  He  traces  the  history  of  the 

i 
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zzdblaR  frnK  the  t±K  When  tae  ar~.tone.tic  arac  w«s  used  vrtihc «t  ipesfea, 
s-sroaga  tr«  period  is  Wsidh  stsdssts  cf  taE  taessy  cf  probability  (au$ 
agar  be  aesstigES  Bcscsnidb,  IB-  rtejaaalli,  ianbert  aod  Lagracge)  <*aesti£E»i 
its  cse,  to  the  tine  Wags  vise  acceptance  cf  the  aetaoa  cf  least  ary  ares 
asveiajed  by  Legenfre  (1S®S)  aad  Gaesss  (1G®5)  led  to  tfbe  belief  feat  tb* 
aritnaetic  aeas  is  tse  anst  prtrable  r-Jte.  He  pleas  for  fkrther 

gjasasgtias  cf  the  ©uesrioE,  raising  ccjectios  to  tie  tse  of  the  asithnetic 
asst  Wags  tse  ceservatiecs  sare  act  closely  bjracsed,  especially  if  they  are  so 
assnaetric  that  there  are  say  -asre  as  aae  side  of  tse  aritbaetic  near*  -Jaa 
as  the  other,  or  «ses»  there  is.  reasce.  to  believe  that  they  are  r-ot  all  fiscally 
reliable.  He  s&eaticrss  a  tmf-er  cf  other  possible  averages,  sees  as  tse  aeciaa, 
the  aidrsige,  aad  tie  arithmetic  mean  cf  those  remaining  after  discarding  the 
(me  cr  mere)  largest  arid  smallest  ebservatioes.  He  concludes  that  the  prcblea® 
cf  tie  best  average  djpezads  cs  the  law  of  facility  of  error  and  feesaoe  has  no 
general  solution.  ’nevertheless,  at  the  end  of  the  paper  be  proposes  sn  itera¬ 
tive  procedure  Wj^ch  starts  free;  tie  arithmetic  aeasi  (cr  sane  other  reasonable 
value),  then  takes  the  reciprocals  cf  the  residuals  (or  their  squares)  as 
weights  of  tie  corresponding  observations  and  thus  obtains  a  second  approxi¬ 
mation.  which  gives  new  residuals,  after  which  the  process  is  repeated  until 
it  converges. 

Gaiss  (1823)  ccepares  his  earlier  formulation  of  the  method  of  least 
square:  [GaiiyS  (ISOS)^  with  that  of  Laplace  (1812),  and  concludes  that  neither 
is  entirely  satisfactory.  The  forcer  is  based  on  the  as-tcpticn.  that  the 
errors  of  observation  follow  a  normal  (Gacssian)  distribution,  which  follows 
fro n  his  postulate  that  the  best  average  of  the  observations  is  their  arithmetic 
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yogi  feat  fee  aefeod  of  least  squares  yields  a  result  teidi 
is  best  gygtodolly  (is  fee  sense  o*  arfrfeiazTng  fee  sac  of  fee  arsofcite 
raises  of  fee  rosifeals  s^;aot  to  fee  nestrictioc  feat  toe  algebraic  scat  of 
fee  rfsiOGals  sail  be  zero,  vhen  fee  sober  of  ocserradas  is  safficierrtly 
larp),  weatgner  fee  d&striirsion  of  errois  [sader  very  general  ccadi tiers 
iwt  stated  by  Laplace).  That  left  a  gap,  teach  fee  sxfecr  aa»  proposes  to 
fill,  for  fee  case  of  a  ssall  or  aoegrate  rrwber  of  ocservstiocs  whose  errors 
are  act  sc*aal2y  distributed.  Gaass  begins  by  cowparing  fee  situaticn  to 
a  game  in  teach  there  is  i£>  gain  to  hope  for,  but  a  lC-Vs  to  fear,  fee  problem 
being  bear  to  atiimrize  the  loss,  which  is  assoaed  to  be  the  sane  for  positive 
and  negative  errors  of  equal  cagpitude.  Tins  assraoticn  can  be  net  by  choosing 
a  loss  fraction  proportional  to  fee  sue  of  the  absobite  values  of  fee  errors, 
as  Lwlate  did,  or  *o  fee  sub  of  their  n^-poiws,  a  being  a  positive  even 
integer.  In  fee  Genuan  sumary,  but  sol  in  \uc  Latin  text.  Gauss  points  out, 
zs  Laplace  had  already  done,  that  the  larger  the  exponent  n  becomes,  the 
nearer  one  cones  to  the  situation  teere  the  most  extreae  errors  alone  serve 
as  a  measure  of  precision.  Gauss  chooses  n*2,  -idch  besides  being  the  staples i 
of  its  type  also  possesses  certain  desirable  properties  [see  Gauss  (1816)  for 
a  proof,  a^sixdmg  a  noreal  distribution,  of  these  properties,  which » unfor¬ 
tunately  for  his  argraent,  do  not  hold  for  certain  other  ccoson  distributions]. 
Cb  the  basis  of  this  choice  he  justifies  the  use  of  the  nethod  of  least 
squares ,  whatever  the  mxaber  of  observations  and  whatever  fee  distribute  on  of 
their  errors.  Gauss'  second  exposition  se^ss  to  the  present  writer  to  be  no 
acre  satisfactory  than  his  first.  In  each  case  he  starts  free  a  postulate, 
plausible  but  not  universally  valid,  which  leads  inexorably  to  fee  foregone 


ccoclisicn.  Snertaejess,  £is  argmeat  apparently  convinced  ids  azstmpor- 
aries,  since  the  literature  of  the  next  few  decades  inclndrs  Many  writings 
an  least  squares  bet  only  a  few  an  rival  Methods. 

/qgastin- Loads  Cauchy  (1824),  given  a  large  goober  of  observations  of 
two  variables,  x  and  7  (points  in  the  ay-plane) ,  seeks  to  detonate  the  values 
of  two  elegants  (coefficients  in  the  linear. regression  equation  y»a*bx)  such 
that  the  absolute  value  of  the  largest  residual  is  a  aimw.  He  accoaplisbes 
this  vamaizatiics  by  Means  of  an  iterative  scheme.  He  shows  that  a  line  in 
the  plane  May  be  such  that  one,  two,  or  three  of  the  given  points  deviate 
froB  it  by  the  Mari aa  anouat,  but  that  fen  die  line  which  is  the  unique 
solution  of  the  problea  there  are  three  such  points  with  the  Maximal  residual, 
two  residuals  of  one  sign  a ad  one  of  the  other  [cf-  Laplace  (179?)].  tie  proves 
four  theoress  concerning  the  passible  systen  of  values  of  the  elements,  and 
gives  a  geometric  Interpretation  of  each  in  terns  of  the  nuaber  of  fa css,  edges , 
aad  vertices  of  a  convex  polyhedron.  Ibis  paper  is  a  condensation  of  a  saaoir 
presented  in  IS  14 ;  the  entire  aesaoir , which  was  published  later  [Ca^diy  (1831)], 
includes  a  generalization  of  the  theory  in  two  directions,  considering  the 
case  in  which  the  fmctioa  of  the  elements  which  represents  the  errors  is  a 
power  series  and  the  nuober  of  elements  exceeds  two.  Cauchy  shows  that  the 
lumber  of  residuals  whose  absolute  value  is  equal  to  the  Maxima  always 
exceeds  by  at  least  one  the  lumber  of  variable  elements. 

Two  aesoirs  by  Poisson  (1824,1829),  large  parts  of  whiot  are  reproduced 
in  a  later  work  [Poisson  (1837)]  are,  according  to  Merrinan  (1277),  pp.  175- 
176,  a  cocnentary  on  the  fourth  chapter  of  Laplace  (1812).  Merrinan  quotes 
Todhcnter  (1869)  to  the  effect  that  Poisson  confines  himself  to  the  case  in 
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winch  one  elener?  is  to  be  deterging  fro*  a  large  amber  cf  cdseryaticES, 
bat  treat  it  in  a  acre  general  aaaner  than  Laplace,  dripping  the  assomrtions 
that  positive  and  negative  er-c-rs  a»e  equally  likely  asd  that  the  lav  cf 
facility  of  error  is  the  same  for  every  cbservati ~n. 

Jsaes  I  very  (1525,1X26)  gives  four  demonstrations  of  the  netbod  of  least 

squares.  His  first  paper  is  divided  into  three  parts.  In  the  first  par*  he 

gives  two  of  his  demonstrations ,  neither  of  which  is  based  as  the  theory  of 

^•iohapiltty.  which  he  considers  irrelevant.  In  the  second  nart,  he  discusses 

the  prr&afci  lit/  of  errors,  falling  to  recognize  that  the  probability  of  ary 

definite  c-rrrr  rcr  a  ccntirtucus  distribution  must  be  an  inf  initesinal ,  and 

casing  no  ci^tiactiflc  between  true  errors  and  residuals -  in  the  third  part, 

he  attempts  to  =sow  that  the  aeihod  of  least  squares  cannot  give  the  most 

advantageous  or  probable  results  unless  the  las  of  facility  of  error  is  the 

h2  2 

normal  $(x)*c  e  .On  page  163,  he  vsles  the  following  statement 
concerning  the  deaczistraticsi  of  Laplace  (1812),  Book  II,  Ch.  iv.  Art.  20: 

"  "  *  whatever  rsjrit  it  z*tj  nave  in  other  respects,  [it]  is  neither  more  nor 
less  general  the  other  solutions  of  the  problem."  Later  authors  have  re¬ 
gard'5 1  I  very ’s  demonstrations  as  unsatisfactory,  and  the  present  writer  shares 
this  opinion .  Glaisher  (1872)  has  ^nalyze^  Ivory's  criticism  of  Laplace, 
which  h.  regarded  as  a  result  of  Ivory's  failure  to  ’ understand  the  demonstra¬ 
tion  of  Laplace,  it  appears  to  the  present  writer  that  Glaisher  was  guilty 
of  the  sane  fault.  In  modem  terminology ,  what  Laplace  actually  proved  in 
the  article  cited  by  Ivory  [see  also  Laplace  (1810,1811a)]  is  that  the  method 
of  least  squares  is  asymp cot: tally  most  advantageous  for  an/  error  distribution 
which  is  well  enough  behaved  so  that  its  mean  is  asymptotically  normally 
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distributed.  He  did  not  dais  to  have  shown  that  it  is  most  advantageous 
[best]  for  ary  finite  maker  of  ohservaticns  free  a  non-normal  eryor  distri¬ 
bution,  but  re  a—e  ~jed  it  as  advantageous  [good]  and  ccaputaticEally 
convenient  whenever  the  maker  of  observations  is  large.  Ivory’s  vecond  paper 
contains  his  fmr.th  deacostraticn,  regard*  4  by  Ellis  (1844)  as  no  acre 
satisfactory  sod  by  Herriaar.  (1877)  as  still  acre  absurd  than  the  previous 
ones. 

Georg  tfilhe'ta  Mrndce  (1825)  gives  zr>  exposition  of  the  method  of  least 
squares  based  largely  on  the  demonstration  of  Gauss  (1823).  He  proposes  the 
use  of  the  arithmetic  aean  of  the  observations  remaining  after  those  farthest 
from  the  aean  have  been  excluded.  3auss  (1828)  gives  a  method  of  solving  tlie 
normal  equations  which  arise  in  carrying  out  the  method  of  least  squares. 
Mhittaker  5  Robinson  (1924)  state  that  this  method  is  substantially  equivalent 
to  reduction  of  a  quadratic  fora  to  a  sum  of  squares. 

Carl  Friedrich  Hauber  (1830a)  extends  the  work  of  Gauss  (1816,1823)  cm 
the  estimation  of  the  precision  of  observations  to  the  case  of  s  observations 
arising  from  populations  having  (possibly)  different  dispersions.  The  situation 
ir  which  all  come  from  the  same  population  is  included  as  a  special  case.  He 
considers  estimators  based  cm  the  square  root  of  the  mean  of  the  squares  of 
the  error--, the  mean  absolute  error,  and  the  median  absolute  error,  lie  com¬ 
pares  the  precision  of  these  estimators  when  the  law  of  the  facility  of  error 
is  the  normal  (Gaussian)  law.  The  mathematical  expressions  which  he  obtains 
agree  with  those  given  by  Gauss  (1816) ,  as  do  the  numerical  results  for  the 
root-mean-square  error  ai.d  the  mean  absolute  error.  For  the  probable  error 
of  the  median  absolute  error  M  he  gives  0.78671  w/vT  (where  w  the  true 
value  of  M) ,  which  he  approximates  by  0.  "r8671M/»/s ,  without  mentioning  that 
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Gffliss  incorrectly  calculated  the  numerical  coefficient  (for  which  he  gave 

c  ^ 

the-  correct  mathematical  expression  v^/8a  , where  p*0.4769363)  ;•  >  oc  0.7520974. 
[Hauber's  value  is  correct  to  within  a  unit  in  the  fifth  decimal  place.] 

Kauber  states  as  an  advantage  of  the  median  absolute  error  that  it  is  inde¬ 
pendent  of  the  law  of  facility  of  error.  This  is  true  in  the  sense  that  it 
is  unbiased  for  any  law  cf  error.  As  for  most  so-called  "distribution- free" 
estimators,  however,  its  precision  and  its  efficiency  relative  to  other 
estimators  do  depeo/Z  on  the  law  of  error.  Hauber  (1830b)  extends  the  results 
of  Poisson  (1824,11:29)  to  the  case  of  two  or  sore  quantities  to  be  determined. 

Hthber  (1830-32),  in  a  six-part  article,  discusses  the  theoiy  of  averages. 
In  the  first  three  parts  he  deals  with  arithmetic  means ,  both  population  means 
(expected  values)  and  sample  means,  along  with  the  root-mean-square  error  and 
probable  error  of  the  latter  and  various  applications.  In  the  fourth  he  dis¬ 
cusses  cases  in  which  the  values  of  the  quantities  of  interest  are  not  given 
by  the  observations,  but  functions  of  these  quantities.  In  the  fifth  and 
sixth  parts  he  deals  with  methods  of  solving  tire  set  of  observational  equa¬ 
tions,  greater  m  lumber  than  the  number  of  quantities  to  be  determined.  He 
gives  three  methods:  (1)  the  method  of  averages,  which  he  attributes  t^ 

Tobias  Mayer;  (2)  the  method  of  least  squares;  and  (5)  a  hybrid  method  ?n 
which  the  number  of  equations  is  reduced  by  Mayer's  method  to  a  manageable 
number  (still  greater  than  the  number  of  quantities  to  be  determined) ,  which 
are  then  solved  by  the  method  of  least  squares. 

C.  von  Riese  (1830)  summarizes  the  result?,  of  the  paper  by  Gauss  (1823) , 
which  contains  Gauss'  second  "proof  of  the  method  of  le^st  squares,  and  the 
1828  supplement  thereto,  which  contains  his  algorithm  for  solving  the  normal 
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equations.  The  author  mentions  the  earlier  ’’proof”  by  Gauss  (1809)  based  on 
the  postulate,  which  von  Riese  attributes  to  Cotes  (1722?)  that  the  arith¬ 
metic  mean  is  the  best  average  of  a  nunbor  of  observations.  He  also  refers 
to  the  work  of  Laplace  (1810, 1811a, 1812)  on  the  method  of  least  squares  and 
on  the  rival  method  b<-.sed  on  minimizing  the  sum  of  the  absolute  errors,  as! 
well  as  the  earlier  work  of  Boscovich  (1757?)  on  the  latter  method  and  the 
articles  of  J.  Bernoulli  (1785)  on  various  averages  and  of  Muncke  (18ZS)  on 
the  method  of  least  squares. 

Johann  Franz  Encke  (1832-34)  gives  an  exposition,  in  three  parts,  of 
the  method  of  least  squares.  The  first,  part  contains  the  ”proof”  by  Gauss 
(1809) ,  an  attempt  by  Encke  to  demonstrate  that  the  arithmetic  mean  is 
necessarily  the  best  average  of  a  rninber  of  observations,  a  discussion  of 
weights  and  probable  errors,  and  two  tobies  of  the  probability  integral 
(2/A)J e_t  dt.  In  this  part,  the  author  also  gives  a  proof,  whidi  he  credits 
to  his  colleague  Dirichlet,  of  the  expression  for  the  probable  error  of  the 
median  absolute  deviation,  which  Gauss  (1816)  had  given  without  proofs  He 
gives  the  value  of  the  nunerical  coefficient  in  tMs  expression  as  0.786716 
[correct  to  six  decimal  places],  as  compared  with  the  values  0.78671  [Hauber 
(1830a)]  and  0.7520974  [Gauss (1816) ] ,  neither  of  which  he  mentions, as  well 
as  a  numerical  example  of  application  of  this  method  to  actual,  data.  The 

* 

second  part  of  the  article  contains  Gauss’  algorithm  for  the  solution  of 
normal  equations  and  his  method  of  determining  weights ,  while  the  third  deals 
with  conditioned  observations. 

Cauchy  (1837)  states  the  following  problem  (pp.  460-461  of  the  English 
translation) :  "  *  *  *  I  suppose  that  a  function  of  x  represented  by  y  is  developed 
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in  a  converging  series  arranged  according  to  the  ascending  or  descending  powers 
of  x,  or  according  to  the  sines  and  cosines  of  an  arc  x,  or,  more  generally, 
according  to  other  functions  of  x  which  I  shall  represent  by  ♦(x)  ■  u,x(x)  *  v, 
<Kx)  *  w;  so  that  we  have  (1)  y  *  au  +  bv  +  cw  +  where  a,b,c,  *’*  are 
constant  coefficients.  Now  the  question  is,  1st,  how  many  terms  of  the  second 
■ember  of  the  equation  (1)  are  to  be  employed,  in  order  that  the  difference 
between  it  and  the  exact  value  may  be  very  small ,  and  capable  of  being  compared 
with  the  errors  to  which  the  observations  are  liable;  2adly,  to  determine  in 
nuabers  the  coefficients  of  the  terms  retained,  or,  in  other  words,  to  find 
the  approximate  value  just  mer.doned."  The  data  consist  of  n  values  of  y 
represented  by  y^(i*l,  *’*  ,n)  and  the  corresponding  values  of  x^(and  hence 
of  u-,v.  ,w- ,  **')  rels^d  ly  n  equations  (2)  y.«  a-u.+  b-v-+  c-w.+  The 

AAA  A  A  A  A  A  A  A 

author  proposes  successive  approximations  based  on  neglecting  all  but  one,  two, 
'  *  *  terras  on  the  right-hand  side  of  equations  (2) ,  the  process  continuing 
until  the  residuals  are  comparable  to  the  inevitable  errors  of  observation. 
Cauchy’s  method  must  be  considered  as  one  of  the  alternatives  to  the  method  of 
least  squares. 

Gotthilf  Heinrich  Ludwig  Hagen  (1837)  advocates  the  use  of  the  method  of 
least  squares  (Legendre  (1806),  Gauss  (1809,1823)],  which  he  explains  in 
considerable  detail.  He  tioes  mention,  however,  the  use  by  Prony  (1804),  before 
the  method  of  least  squares  was  known,  of  the  method  of  Laplace  (1799)  based 
on  the  criteria  of  Boscovich,  as  well  as  the  work  of  Lambert  (1765a).  He 
also  discusses  the  suppression  of  outlying  observation,  which  he  strongly 
opposes  unless  there  is  some  reason  other  than  the  fact  that  they  deviate 
considerably  from  the  remaining  ones,  and  the  assignment  of  weights  to 
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individual  observations.  Friedrich  Wilhelm  Bessel  (1838)  discusses  the  prob¬ 
ability  of  errors  of  observation  and  the  method  of  least  squares  as  developed 
by  Laplace  (1812),  Gauss  (1823),  and  others.  He  shows  that  the  normal  law 
of  error  is  not  to  be  regarded  as  an  a  priori  rule,  free  from  exception ,  and 
throws  new  light  on  the  conditions  under  which  it  holds.  Bessel  and  Johann 
Jakob  Caeyer  (1838)  join  Hagen  (1837)  in  taking  a  firm  stand  agains:  the 
rejection  of  outlying  observations,  which  had  been  advocated  and  practiced 
by  such  earlier  authors  as  Boscovich  (1755,  1757),  Lambert  (1760,1763a),  and 
Legendre  (1805).  S.  Stampfer  (1839),  on  the  other  hand,  has  no  qualms  about 
rejecting  observations.  Of  nine  determinations  of  the  ratio  of  the  lengths 
of  the  Vienna  fathom  and  the  meter,  by  various  methods,  he  rejects  the  two 
smallest  on  the  grounds  that  both  were  obtained  by  comparisoiiS  with  the 
French  standard  half  toise,  which  leads  him  to  suspect  a  constant  error  in 
the  standard  half  toise.  He  is  still  not  satisfied  with  the  result,  and 
proceeds  to  discard  also  the  smallest  of  the  remaining  values,  apparently 
for  no  other  reason  than  its  discrepancy  from  the  six  still  remaining. 
Christian  Ludwig  Gerling  (1843)  gives  an  excellent  treatment  of  the  method 
of  least  squares.  He  recommends  great  caution  in  discarding  observations, 
but  says  even  so  that  "there  remain  observations  which  we  must  discard  after 
the  fact,  because  we  hold  it  to  be  more  probable  that  a  gross  blunder  has 
occurred  than  that  an  unavoidable  error  can  produce  such  a  large  deviation/' 
William  Fishbum  Donkin  (1844)  starts  from  the  assumption  that  the 
weight  of  an  observation  is  proportional  to  the  square  of  its  precision 

(inversely  proportional  to  its  variance)  and,  as  one  would  expect,  he  reaches 
the  same  conclusion  as  the  one  Gauss  (1823)  reached  by  assuming  a  squared 
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error  loss  ftmcticn,  namely  that  the  method  of  least  squares  should  be  used, 
independently  of  the  lm  of  facility  of  error.  Hebert  Leslie  Flli*  (1S44) 
examines  in  detail  the  demonstrations  of  the  Method  of  least  squares  by 
Gauss  (1809),  Laplace  (1812),  Gauss  (1323)  and  Ivory  (1825,1826).  He 
that  Laplace's  objection  to  Gauss*  first  demonstration ,  based  an  the  postu¬ 
late  that  tiie  arithmetic  Mean  is  the  best  average  to  take  of  a  aafcer  of 
observations,  is  justified.  He  regards  Laplace's  deoonstration  and  Gauss* 
second  as  somewhat  more  satisfactory,  but  endeavors  to  show  that  none  of  the 
three  tends  to  prove  that  the  results  of  the  method  of  least  squares  are  the 
■ost  probable  of  all  possible  results.  He  finds  Ivory's  demonstrations,  which 
are  not  based  on  the  theory  of  probabilities ,  not  at  all  conclusive. 

Lanbert  Adolphe  Jacques  Quetelet  (1846)  gives  an  elementary  exposition 
of  the  theory  of  means  and  of  the  laws  of  error,  in  which  he  advocates  use 
of  the  interquatile  distance  as  a  measure  of  the  probable  error.  He  uses 
this  method  of  estimating  the  probable  error  of  the  right  ascension  of  the 
North  Star  [repeated  measurements  of  the  same  quantity]  and  cf  the  chest 
measures  of  Scottish  soldiers  [measurements  of  related  quantities] . 

Augustus  IP  Morgan  (1847)  gives  an  extensive  treatment  of  the  method 
of  least  squares  which  consists  largely  of  a  translation  of  and  comments  cm 
the  treatment  of  Laplace  (1812).  In  cases  in  which  the  relative  precision 
of  the  observations  is  in  doubt,  he  proposes  an  iterative  procedure  in  which 
one  makes  the  best  possible  initial  estimate  of  the  weights,  finds  the  most 
probable  result,  then  adjusts  the  weights  accordingly,  and  repeats  the  process 
until  assimed  and  deduced  weights  agree. 

Sir  John  Frederick  William  Herschel  (1850)  gives  a  demonstration  of  the 


Method  of  least  smcts,  steal  ns  to  trat  of  Aaaia  based  oe  the 

-^o^ptfcn  ir**t  tae  caapcaeggs  of  error  is  a®  cctac&sal  caigeefiaes  are 
iadepeadegt-  Is  ccanestiKg  os  the  wcsi  of  Qoetelet  (1S46),  ae  qacssfaes 
by  Waat  "i«»ncil  process  trae  latter  cfctairec  Ms  awrages  of  the  driest 
Measures  of  Scotch  soldiers  and  the  heists  of  Freach  cxscrisas,  prasth^ 
out  that  Ms  rallies  do  sot  agree  vita  those  of  either  the  arithmetic  aseas 
or  tie  Median.  Ellis  (1*3®)  discusses  Saersoel’s  proof  of  the  Method  of 
least  squares.  Which  he  regards  as  sesatisfaetsry,  and  explains  and  defends 
Laplace’s  Method.  ikrjrln  (1851)  offers  sane  critical  ranaris  caa  the  theory 
of  least  squares,  and  especially  cn  the  marks  of  EUis.  SfccksB  says  that 
Hersehel*s  proof  "should  be  treated  with  respect”  sad  that  the  Method  of 
least  squares  nay  be  used,  if  for  so  ether  reason,  because  ""it  is  a  very 
good  Method”,  as  shown  by  Gauss  (1322) . 

Jules  Bienayae  (18~2)  reviews  the  developneat  of  tit  theory  of  least 
squares  from  the  early  work  of  Legendre  (18DS) ,  Gauss  (ISO, 1823)  and 
Laplace  (1811a,  1812).  He  considers  the  nodificaticES  and  generalizations 
required  Wien  the  observations  are  not  all  equally  precise  and  when  not  one 
but  several  variables  are  to  be  estimated,  with  particular  enshasis  on  the 
precision  of  the  results  of  applying  the  method  of  least  squares  to  these 
cases.  A  later  paper  of  Bienayne  (1858)  i_  practically  identical  with  this 
one. 

Benjadn  Peirce  (1852)  proposes  the  first  objective  criterion  for  the 
rejection  of  observations,  based  on  the  principle  that  observations  in 
question  should  be  rejected  Wien  the  probability  of  the  system  of  errors  when 
they  are  retained  is  less  than  that  of  the  system  of  errors  obtained  by  their 
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yipj>weg-iH»g>  3sH^l2fi£  by  tae  gsacafraliry  sf  wiag  exactly  so  aany  acacsaeJ 
i*y  **  jpiBdTV^wc  .  1&S  i  JPff'.gntc  ®5  CSj-tegME  SOS  (SflSlS?  S  SSCOCSeCi  Sy  13ES3KT 

v?y»  ge  oesttsse,  ffSaer  £!333j  gags  ac  ssaeUsEr  subset  a€  these 
saopcsad  cp  to  *£g£  trng- 

fWfry  »?ipg»*»r«  thafr  Sis  nsdrrjfg  c-5  iriterocfogiar:  nCandry  PESF}1 

css  be  esec  to  aesesndae  swresel  arenanr.  gages  ties  froe  e  saagwart  systs* 

of  wycHnw,  vita  resslts  nearly  as  accsgate  as  by  tae  aetaoc  inf  least 

saaares.  ^sEayae  (I£S5a)  arises,  aawgier,  taat  tae  tao  netaods  are  ©uapletsST 

•JifSstect  amrg  cths  gh»v  a  exists.  fcgagy  c£5S3fe)  igaptaians  taat 

is  ijpg»^rig»riirr^  Ms  ©f  iEtejtJbSatifflE  is  preferable  to  tae  nr-tims 

cf  least  squares.  r»rfy  (13S3c)  rlgrn*  that  sis  netarad  is  tae  shortest,  aad 

tast  tae  aetaoc  of  least  ^a-Tr**;  gives  tae  nest  probable  s*?so!ts  only  aaaer 

ctrrtzin  ccDditiecs,  vsica  are,  aocorsSsg  to  Gaudy  (lS53S,e),  tast  tae  law 

of  facility  of  error  is  tae  sane  for  all  tae  errors,  tast  ao  liaits  can  be 

assigned  vo  tae  nsgnitode  of  as  error,  Sad  tast  tae  probability  of  an  error 

.2  2 

is  prcporticBal  to  e  x  .  Gaudy  (lS53f)  snows  taat  tae  nest  probable 
values  nay  saaetaces  differ  fro®  those  found  by  the  net hod  of  least  squares. 
Bienayne  (1853b)  reviews  some  of  Caadiy's  articles  [Gaudy  (1853d,e,f )]  and 
maintains  that  the  nean  of  the  sun  of  squares  of  the  errors  is  cader  all 
circumstances  a  measure  of  precision  of  the  observations.  Candy  (1853g) 
shows  that  the  svstes  of  weights  which  nakes  the  largest  error  to  be  feared 
in  a  nean  as  snail  as  possible  often  differs  considerably  frees  that  given  by 
the  method  of  least  squares. 

Joseph  Bertrand  (1655)  offers  certain  historical  and  critical  resaarks 
on  presenting  a  copy  of  his  translation  into  French  of  the  Latin  memoirs  of 
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Caere  As  trxsjistnsit  bv  Sale  F.  'Sscesas-  0357)  ire  s-tace  beat 

gaggoagge  fiat  Beresvif^s  Fg.rotfc  ?.? ^rsl-dor-. 

SteBTawsE  Apgrrsgff;  £cc&£,  Jt.  03£S}  gf*es  ?afc?.es  rsr  Feir’Ct's  critasiot: 
y-r>  acre  eabatss&e  ry-^rr  a? rr*y  cf  Peirce  fUSF}  zof  saciiaas'  o©  acre 
significatn  fjjgsges.  H3e  ranceis  Pens's  3*©  EsanciSes,  amc  is  aras  crec  oas- 

^Tr-Vix  irfhyr  qiWTry  pn»  ggtjqt  SCOnH  C*  TejiSCZsi,  vitie?C3S  &E2ZZS  reiSCtOS 

a©;  cowerer,  3§aey  0525}  ire  pciistsd  cct  that  aeitigr  result  is  trnssarartfs', 
efffry  Q-r3~j  csss  Pfeircres  inrasrrect  rebx  cf  Use  staafred  QeraasigiB- 

Hanrisrrr  LJswe  {£$53}  aevreatss,  i:  effect,  tie  cse  cf  tie  adit-age  is 
ywrsg^T*!;  aKtscrolcgfcal  cossrrscifflEs.  Hiss  sorerage  is  still  sused  try  met s- 
ccolcgists  tsday,  tie  sc-casied  seat  caij  teacetatsse  beiag  tie  arithmetic 
iyyi  cf  tie  highest  sac  lowest  tsaceratssre  reacroec  shying  a  2-4-ibucr  perk e. 

George  Sicwell  Airy  (13S6),  after  stscyiag  tie  racers  cf  Ffeirce  (I$SZ) 
red  Gauld  0® 55}  os  Peirce's  criteria!:  for  tie  rejection  cf  dscbtfol  dbserrz- 
tiaas,  stamnsrires  his  ccsclusians  re  follows:^!-  "Hie  mathematical  theory  of 
prcesbilities  fails  is  all  question?  applying  to  errors  of  extreme  magtitare. 
2.  )<>  considerations  of  tie  magnitude  cf  residual  errors  per  se  will  pzstify 
os  is  rejecting  a  result.  3.  <fe  are  justified  in  rejecting  a  result  coiy 
when,  freer  tie  best  estimate  tret  we  can  font  cf  tie  extent  of  actios  of  the 
various  causes  which  can  produce  error,  we  find  that  tire  combination  of  these 
causes  of  error  cannot  possibly  produce  tire  discordance  in  question; --4.  And 
when  we  perceive  that  other  causes  may  have  intervened ,  whose  nature  is  such 
that  they  cannot  be  recognized  as  occurring  in  the  ordinary  series  of 
observations."  Joseph  Kinloch  (1556)  answers  Airy's  criticisms  of  Peirce’s 
criterion,  sirring  op  the  case  in  its  favoT  as  fcl?<>s:  'Pegarding  the 


gggfafeiiitx  cf  ar,  tenge  as  a  fanetbaB  cf  its  Mgritaae,  we  see  caafelac  to 
Sad  tie  ‘zadLm-iMzr  cf  jet  srstaa  cf  rrsf-dtal  erases,  and  by  tae  aararfsqn 
ef  She  s-flstaes  ec  errrcs  befsce  asd  after  rsjisctiac  is  accsroanoe  adth  tie 
ssiSe  cf  tae  Qritgriggi,  we  can  decide,  aita±c  safe  loots,  nfeetfaer  tie 
pgrecacility  cf  oar  fraal  result  is  Lessened  sy  settissrg  tae  doubtfiai 
acscmtiass. m 

J bseefc  Sfetrwl  (1SS7)  bolds  that  tie  aessod  cf  least  sctxres  is  mas. 
applicable  in  optics,  because  of  tie  f*ilss=  sf  tie  aaaerfolzg  assucthocs 
t*at  pcsitiw  and  negative  errees  of  sepal  aagsltade  ate  earzllr  probable  and 
that  all  j&sercias  are  a*5e  oncer  equally  £ aicgable  emei  tiers  by  equally 
etserrets  vita  easily  good  iastraaests.  He  qusstiass  especially 
tbs  second  of  these  assoaotlaes-  He  prcocses  instead  wer at  be  calls  "tie 
Method  of  aunerlcaily  aqaal  raxing  aai  nanira*’  is  afeaca  tie  Bzrimar  cf  tie 
absolute  nhes  of  the  residuals  is  mrdttized.  He  poir-ts  act  that  tins  is 
aqamlest  to  asnhdzisg  tie  sat  of  tie  2sr“-  powers  cf  tie  residuals,  where 
a  is  an  integer  wedds  tends  to  infinity  as  a  liadt. 

C.  G.  tod  Andrae  (1865)  studies  tie  probleat  of  choosing  ooe,  teo,  three, 

of  a  series  of  a  equally  Tellable  observations  to  be  used  instead  of  all 

n  observations  in  detendning  tie  aost  advantageous  value  of  the  Measured 

quantity.  In  choosing  a  single  observation,  he  uses  the  principle  [Laplcce 

(1799)]  of  gjp.igizing  the  sub  of  the  absolute  valies  of  the  errors,  dropping 

the  Boscovich  condition  that  'he  sues  of  positive  and  negative  errors  be 

equal  in  nagnitude.  Hence  he  chooses  the  median,  which  he  defines,  for  a 

saaple  of  size  n,  as  the  a^-  erdered  observation,  where  vm/2.  By  analogy, 

if  s  observations  are  to  be  used,  he  chooses  the  b^-  ordered  observations 
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(2*1,2,  "  '  JS)  .  W&2T*  2 3/ (5*1}  . 

£ 

Airy  £22£2}  jjres  *  flscsssfse  of  the  ae^-of  sf  least  sabres,  accSccS 
»EE2243Hi25  acy  rirai  aetibods,  ass  meek  fjstfgr  canne/rts  olaog  tat  lists  of 
tfasse  is  Sis  earlier  pser  f-ftity  (ISSf}}  os  tat  rtfsctisc  of  dhobtfel  dsse*'- 
■rethare. 

Gisrlas  JL  Scastt  £I3&2}  glrer  *  fret  trsslgtkaa  isso  Bqgllss  of  the 
gager  of  Csscfeg  (255?}  m  Caxhr^s  aertataf  of  fagarrooiaaiaR..  ISHsjbe  Pitt 
Qc&cm&sd  Sgstl&s.  (1362)  applies  this  method  t~  actual  ccsemzl&zs  is.  the 
fields  of  peysfcs  sod  casnsstry. 

ISliiam  Ghsarenet  (i&aS)  ,is  as  appendix  to  tit  secmd  volume  of  bis 
treatise  os  astrairsrr,  givas  a  derails a  discsBsicE  cf  tit  3ttaod  of  least 
squares.  lbs  author  ciscszsses  Peirce's  criterias  for  the  rejecticc  of  icdbt- 


fa1  Gcserretlccs  sc  proposes  ids  cse  criterias  for  rejecting  a  single 
efcservsthsa.  Ut  latte-  is  based  or  tat  principle  that,  since  ths  nsrexr  of 
errors  numerically  greater  then  «  that  may  be  expected  to  occur  in  a  obser¬ 
vations  is  2n/~  s(t)dt  *  n*(«0,  where  ?(t)»  e~  r,  an  observation 
deviating  iron  the  mean  by  si  asxcr  greater  than  co  shcold  be  rejected  if 
the  quantity  a?(r)  exceeds  V 2,  since  such  an  error  "VL11  have  a  greater 
probability  against  it  than  for  it."  The  appendix  aid  related  tables  were 
reprinted  separately  iu  18d3. 

Augustus  DeMorgaa  (1564)  declares  that  the  arithmetic  a can  is  the  best 
average  of  a  series  of  observations  because  the  Dost  probable  result  is  the 


arithmetic  mean  plus  corrections  of  which  ve  have  no  knowledge,  either  as 
to  sign  or  value,  and  no  oeans  of  getting  any,  so  that  there  is  no  reason 
for  supposing  that  the  true  vai  ic  lies  on  one  side  of  the  arithmetic  mean 


-atfcer  tfap  tk  other. 

Isaac  (1365),  m  Ms  Matter  of  sKteil’tqr,  soaeiss  tac 

an  cf  Tgiatg  «itss  sc  tat  dbacay  sf  errars,  iaMacsag  S*soc  (1757), 
Ijpafe  (1774),  3.  BeraxilH  (1773),  ruler  (1773).  J.  Sersedii  (1785).,  aad 
Srapcg  (1325),  as  Kell  *5  daaeraas  writings  cf  Laplace  (17744723.4735, 
1799,1815, 2511a  ,3,1512) .  Tbe  last  aocr  cf  these  deal  prmrilr  with  the  Method 
of  least  »»»,  bet  Maaer  arias  it  clear  that  laplacc  never  entirely 
Mrieaed  soar  of  Ms  earlier  Methods. 

zzsmxt  Jaaes  State  (1165)  defines  %fca t  he  calls  a  of  carelessness, 

x,  *feica,  fox  a  giwn  observer  sad  a  givea  class  of  ccserraticBS,  expresses 
the  average,  rzaaber  of  obserrariccs  «rnch  that  person  safes  with  cae  rdstafe. 
Striae  proposes  a  criterion  for  rejection  of  observations  %e>ich,  with  **2n, 
vbere  n  is  the  meter  of  observatiers,  is  equivalent  to  Cisxsxreset's. 

i fl Thebe  Jordan  (1369)  extends  Gauss'  table  of  factors  for  coapatlcg  the 
probable  error  acd  its  probable  jxxrtaixxty  fro r  the  root  of  the  scan  of 
the  powers  cf  dae  absolute  values  of  deviations  from  the  true  value  up 
through  o*10  sod  corrects  Gauss*  factors  for  the  nedian,  which  be  shows  to 
give  a  slightly  less  (rcther  than  acre)  precise  estiaate  than  the  other  aethod 
for  b*6. 

Tocbamter  (1865)  develops  Laplace’s  treataent  of  the  aethod  of  lea_t 
squares  ai«?  deaewstrates  that  sane  of  the  results  which  Lip  lace  obtained  for 
the  case  of  t *r  elcaents  hold  for  the  case  of  any  ruxtiber  of  e lessen ts. 

Cleveland  Abbe  (1871)  gives  s  historical  note  on  the  rrj'thod  of  least 
squares  in  which  he  points  out  thet  although  Legendre  (1805)  was  the  rirst 
tr  publish  the  nethod  and  Gauss  had  used  it  since  179S  (though  he  did  not 
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publish  it  until  1809) ,  it  was  independently  developed  by  Adrain  (1808) 
in  America .  Tire  author  reprints  a  portion  of  Adrain’ s  original  investigation, 
gives  interesting  biographical  notes  on  Adrain,  and  summarizes  the  results 
of  two  of  his  later  papers  [Adrain  (1818a, b)]  in  which  he  applies  the  method 
of  least  squares.  G.  Zachariae  (1871)  gives  an  excellent  textbook  treatment 
of  the  method  of  least  squares. 

James  Whitbread  Lee  Glaisher  (1872)  gives  a  history  of  the  method  of 
squares,  including  an  account  and  a  critical  evaluation  of  the  contributions 
of  Leger.dre,  Adrain,  Gauss.  Laplace,  Ivory,  Ellis, De  Morgan  and  others.  He 
offers  an  alternative  to  the  rejection  of  observations  in  the  form  of  an 
iterative  procedure  in  which  the  weights  of  the  observations  are  adjusted 
after  each  iteration  as  proposed  by  De  Morgan  (1847).  Last,  but  not  least, 
he  proves  that  if  errors  are  distributed  according  to  Laplace’s  first  law 
[f  (x)-(m/2)e"m^x^,  the  ^edian  of  the  observations  is  the  most  probable  true 
value.  He  does  this  by  showing  that  the  probability  [density]  of  the  true 
value  x  is  proportional  to  exp  [-m(the  sun  of  the  absolute  values  of  the 
deviations  of  the  observations  from  x)]  and  that  that  sum  is  a  minimum  when 
taken  about  tile  median  [the  middle  one  of  an  odd  number  of  observations  or 
any  value  between  the  middle  two  of  an  even  number  of  observations] . 

Friedrich  Robert  Helmert  (1872) ,  in  the  first  edition  of  a  book  trn  the 
adjustment  computation  by  the  method  of  least  squares,  gives  a  proof,  f  dewing 
that  of  Gauss  (1816) ,  that  the  probable  error  can  be  determined  more  precisely 
from  the  mean  of  the  squares  of  the  errors  of  a  number  of  observations  (assumed 
to  have  come  from  a  normal  distribution)  than  from  the  moon  of  the  absolute 
values  of  the  errors .  In  later  editions,  he  dds  a  section  on  the  theory  of 
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t»  m goal  error  znd  its  xa  the  adasiob  of  cbsejntios. 


Shuaef  0*^5)  iisborates  m  tfse  alfcssatire  to  ths  rcjectim  of  obser¬ 
vances  proposes  is  Ms  earlier  taper  [Glaisher  (1ST?)].  He  also  exmanes  the 
criterion  proposed  fcjr  Stsoe  (1S6?)  for  ear  rojectim  of  outlying  observations, 
and  criticises  it  on  ?*r-  froooJs:  (1}  *cr  rejection  criteria*  based  on  the 
sspcositicp  of  tse  validity  of  the  arithmetic  ness*  is  inconsistent;  (2)  Em* 
aanrg  sndi  criteria  Stone’s  is  sot  the  sort  desirable  one  iod  is  impractical 
because  of  the  practical  zaenssibility  of  detenrLsdng  the  vali*-  of  r.  it 
being  assayed  that  the  observer  makes  ers-  mistake  is  s  observations.  In  two 
papers  published  the  sane  year.  Stone  (1873a, b)  justifies  the  use  of  the 
arithmetic  mean  and  the  nomel  lav  of  error  on  the  basis  of  the  axles*  that 
all  direct  measures  are  of  equal  value  and  examines  in  detail  the  objections 
raised  by  Glaisher  (1873)  to  the  author’s  criterion  [Stone  (1868)].  He  points 
out  that  his  criterion  is  relatively  insensitive  [robust,  as  modern  statisti¬ 
cians  would  say]  to  moderately  large  variations  in  n.  He  insists  that  even  if 
Glaisher's  assumptions  are  granted,  Glaisher  has  not  maximized  the  right 
expression,  and  hence  las  not  found  the  correct  weights  for  the  observations. 
Further  notes  by  Giaisher  (1874)  and  Stone  PS74j  appear  to  have  generated 
More  heat  than  light. 

Todhunter  (1873)  reviews  the  work  of  various  authors,  especially  Bosccvich 
and  Laplace,  on  methods  used  to  find  the  equation  y*a+bx  of  the  best- fitting 
straight  line  involved  in  the  determination  of  the  eilipticity  of  the  earth 
fro*  measurements  oi  degrees  of  meridian  and  lengths  of  a  seconds  pendulum  at 
widelv  separated  points  on  the  earth’s  surface.  He  writes:  "I  presi’me  that 
neither  of  the  methods  which  Laplace  [ f!799) )  discusses  would  now  be  practically 
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used  in  soda  calculaticcs,  but  the  method  of  least  squares”. 

Gustav  Theodor  Fectner  (1874)  shows  that,  while  the  sua  of  squares  of 

deviations  is  a  ainiua  wfes&  taken  fro*  the  ari.tlaK.tic  mean,  die  sua  of  the 

absolute  deviations  is  a  won  am  then  taken  ^roa  the  Median.  He  wakes  a 

Teaazk  winch  leads  to  the  conclusion  that  he  was  unaware  that  the  latter 

fact  was  known  to  van  Aodrae  (I860)  and  was  proved  by  Glaisher  (1872) .  He 

also  discusses  power  weans ,  which  he  defines  as  values  such  that  the  stars 

of  powers  of  deviation  are  Minimal  when  taken  frot  then,  and  probability 

laus  under  iduch  such  power  Means  are  valid  averages. 
t  # 

Herve  Auguste  Etienne  Albans  Faye  (1875)  discusses  various  justifica¬ 
tions  of  die  aetbod  of  least  squares.  He  points  out  that  Gauss  and  Legendre 
deduced  it  free  the  accepted  opinion  that  the  aost  probable  value  of  a 
quantity  of  which  a  nueber  of  observations  have  been  Bade  is  their  arithmetic 
mean,  while  Laplace  and  others  justified  it  on  the  basis  that  the  errors  are 
due  to  a  large  number  of  causes  each  contributing  only  a  small  part  of  die 
resultant  error.  He  insists  that  the  law  of  probability  of  errors  cannot 
be  established  a  priori,  on  the  basis  of  a  hypothesis  or  of  a  generally 
accepted  opinion,  in  spite  of  the  extreme  elegance  of  die  proof  of  Gauss, 
but  mist  be  established  a  posteriori,  from  a  direct  study  of  the  ficts.  He 
gives  an  exanple  in  which,  because  of  a  systematic  error,  the  method  of 
least  squares  gives  an  extremely  misleading  result;  quite  rightly,  however , 
he  does  not  blame  this  result  on  the  nethod  but  on  th2  observations.  Hermann 
Laurent  (1875)  ,  consenting  cn  the  same  question,  says  that  the  Gaussian  law 
of  error  should  never  be  accepted  a  priori;  on  the  contrary,  one  ought  to 
reject  it,  because  it  assigns  positive  probabilities  to  impossibly  large  errors. 


"Ibo  is  \tfce  astronomer”,  he  inquires,  "who  nates  m  error  of  361  degrees  in 
measuring  an  angle?”  He  nates  a  study  of  1444  measurements  of  si  angle  of 
approximately  16°,  and  concludes  that  the  observations  cast  doubt  on  the 
exactness  of  the  Gaussian  law,  and  that  therefore  one  ought  to  reject  the 
nethod  of  1  t  squares  when  one  has  only  a  small  maker  of  observations. 

Francis  Gal  ton  (1875)  proposes  the  use  of  the  median  as  a  measure  of 
central  tendency  mi  of  the  difference  between  the  radian  and  me  of  the 
quantiles,  or  the  average  distance  between  the  median  end  the  two  quart  iles , 
as  a  measure  of  dispersion  (probable  error) . 

Truoan  Henry  Safford  (1876)  gives  rules  for  good  observation  based  on 
the  method  of  least  squares,  and  hints  for  abbreviating  coirputations .  Mens  fie  Id 
Merriman  (1877)  gives  a  chronological  bibliography,  containing  408  titles  and 
covering  the  period  1722-1876 ,  on  the  method  of  least  squares  and  rival  meth¬ 
ods,  with  valuable  historical  and  critical  notes. 

Benjamin  Peirce  (1878)  gives  a  fuller  explanation  of  the  criterion  which 
he  proposed  over  a  quarter  of  a  century  earlier  [Peirce  (1852)].  Charles  A. 
^hott  (1878)  makes  favorable  remarks  on  Peirce's  criterion,  based  on  twenty 
years  of  use  in  various  investigations. 

Francis  Ysidro  Edgeworth  (1883a)  questions  the  universal  and  indiscrim¬ 
inate  use  of  the  normal  (Gaussian)  law  of  error  in  the  following  words : 

'The  Law  of  Error  is  dsducible  from  several  hypotheses,  of  which  the  most 
important  is  that  every  measurable  (physical  observation,  statistical  number, 
8c.)  may  be  vegarded  as  a  function  of  an  indefinite  number  of  elements,  each 
element  being  subject  to  a  determinate,  although  not  in  general  the  same, 
law  of  facility.  Starting  from  this  hypothesis,  I  attempt,  _Arst,  to  reach 
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the  usual  conclusion  by  a  path  which,  slightly  diverging  from  the  beaten 
road,  may  afford  sane  interesting  views;  secondly,  to  show  that  the  excep¬ 
tional  cases  in  which  that  conclusion  is  not  reached  are  more  important  than 
is  coramciily  supposed",  (pp.  300-301).  Later  in  the  same  paper  (pp.  305-306), 
he  writes;  "I  submit,  in  the  absence  of  evidence  to  the  contrary,  that  non- 
exponential  [ncn-Gaussian]  laws  “*  do  occur  in  rerum  natuift,  that  the' ancient 
solitary  reign'  of  the  exponential  [Gaussian]  law  of  error  should  cane  to 
an  end."  Edgeworth  (1833b)  begin*  a  paper  on  the  method  of  least  squares 
with  a  philosophical  discussion  of  the  difference*  between  the  approaches  of 
Gauss  and  Laplace,  between  most  probable  results  and  most  advantageous  results, 
and  between  minimizing  mean  square  errors  and  mean  absolute  errors.  He  pro¬ 
ceeds  to  the  question  of  how  to  treat  outlying  observations.  He  proposes  a 
method  of  weighting  the  observations  which  is  the  same  as  that  proposed  by 
Stone  (1873b).  In  a  later  paper  [Edgeworth (1887a) ,  p.  373  (footnote)],  he 
acknowledges  Stone's  priority,  of  which  he  was  unaware  at  the  time  he  wrote 
this  paper. 

The  year  1884  saw  the  publication  of  two  books  on  the  adjustment  of 
observations  by  the  metho*  of  least  squares.  Both  authors  aiso  consider  the 
question  of  the  rejection  of  outlying  observations.  Merriman  (1884)  advocates 
the  use  of  Chauvenet's  criterion,  but  he  also  discusses  two  other  criteria-- 
Peirce's  and  a  new  one  based  on  Hagen's  deduction  of  the  law  ox  erroi .  More¬ 
over,  he  states  (p.  169);  "In  general,  it  should  be  borne  in  mind  tliat  the 
rejection  of  measurements  for  the  single  reason  of  discordance  with  others 
is  not  usually  justifiable  unless  that  discordance  is  considerably  more  than 
indicated  by  the  criterions,  A  mistake  is  to  be  rejected,  and  an  observation 


giving  a  residual  greater  than  4r  cr  Sr  [r  »  probable  error]  is  to  be  regard©! 
with  suspicion,  and  be  certainly  rejected  if  the  notebook  shows  any  thing 
unfavorable  in  the  circumstances  under  which  it  was  taken",  Thomas  Wallace 
bright  flS84)  advocates  rejecting  an  observation  whose  residual  is  greater 
than  five  tines  the  probable  error  (or  three  times  the  mean  square  error);  in 
the  second  edition  [Wright  §  John  Fillmore  Hayford  (1906)],  this  rule  is 
restated  in  slightly  modified  form, 

4.  THE  AWAKENING  (1885-1945) 

Edgeworth  (1885)  discusses  the  choice  of  measures  of  central  tendency 
and  of  variability.  He  insists  thst,  while  the  arithmetic  mean  and  the  root- 
mean-square  deviation  from  it  are  most  accurate  for  samples  from  a  normal 
population,  other  measures  (median  and  mean  absolute  deviation  or  quartile 
deviation)  are  more  convenient  and  little  less  accurate,  while  for  other  popu¬ 
lations  they  may  be  move  accurate  as  well  as  more  convenient.  With  regard  to 
measures  of  variability  he  writes  (pp,  188-189):  "'"When  the  observations 
really  conform  to  a  [normal]  probability- curve,  there  are  several  formulae 
for  the  modulus  [c-c^,  where  o  is  the  standard  deviation]  which  are  little 
inferior  to  the  above  [root-mean-square  error]  in  respect  of  accuracy,  and 
two  of  them  which  are  superior  in  respect  of  convenience.  If  we  call  the 
preferential  method  the  method  of  mean  square  of  errors,  one  of  the  rival 
methods  might  be  called  the  method  of  mean  first  power;  the  other  the  method 
of  mean  zero  powers.  **'  [The  last]  method  is  that  described  by  Mr.  Gal ton 
[(1875)]  the  same  in  principle  as  that  which  was  employed  by  Quetelet 
[(1846)].  The  essence  of  this  method  is  to  note  the  points  between  which 
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are  comprised  quarters  (eights  or  other  fractions)  of  the  total  number  of 

observations,  and  then  to  equate  the  distance  thus  given  by  observations 

to  the  corresponding  multiple  of  the  modulus  as  assigned  by  theory.  For 

example ,  if  we  take  two  points  so  that  between  them  there  occur  half  the 

total  number  of  given  observations,  and  outside  each  of  than  a  quarter  of 

the  total  number,  the  distance  between  these  two  points  ought  theoretically 

to  be  equal--is  equatable--to  twice  the  modulus  x  6.476."  On  pp.  190-191, 

Edgeworth  discusses  the  use  of  the  median.  He  finds  that  the  fluctuation 

[the  square  of  the  modulus  c(or  twice  the  variance  c  )]  of  the  distribution 

of  medians  of  sets  each  consisting  of  m  observations  is  equal  to  the  recip- 
2 

rocal  of  2my  ,  where  y  is  the  maximum  ordinate  of  the  probability  curve 
divided  by  its  area. 

Edgeworth  (1886)  explores  in  detail  the  relative  advantages  of  the 
arithmetic  mean,  the  median,  and  the  mode,  with  less  attention  given  to 
other  possible  means.  On  the  grounds  of  precision,  he  declares  the  arith¬ 
metic  mean  to  be  superior  to  the  others  for  the  normal  law  and  others  near 
it,  but  says  the  median  is  better  "when  the  apex  of  the  curve  is  very  high 
and  its  extremities  very  much  extended."  (p.  167).  "In  respect  of  convenience, 
[the  Mode]  has  a  considerable  advantage  over  the  Arithmetical  Mean  and  a 
less  marked  advantage  over  thw  Median."  (p.  168).  The  author  makes  passing 
reference  to  the  use  of  the  quartile  deviation  by  Quetelet  (1846)  and  of 
quartiles  and  deciles  by  Galton  (1875)  in  estimating  the  probable  error,  and 
to  the  method  of  situation  and  the  most  advantageous  method  [Laplace  (1799, 
1812)]. 

Simon  Newcomb  (1886)  considers  the  problem  of  combining  a  nunber  of 
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observations  of  the  same  quantity  so  as  to  obtain  the  best  result.  He  raises 
two  objections  to  the  criterion  for  rejection  of  doubtful  observations  pro¬ 
posed  by  Peirce  (1852) :  (1)  It  disregards  any  a  priori  knowledge  of  the 
probable  error  of  the  observations  and  seeks  to  determine  it  from  the  obser¬ 
vations  themselves;  and  (2)  It  does  not  take  account  of  the  fact  that  the 
a  priori  probability  of  an  observation  varies  free  one  observer  to  another. 
Given  n  observations  assumed  to  have  cane  from  a  generalized  law  of  error 
which  is  a  mixture  of  m  normal  laws,  with  proportions  p^  having  precision 
h^(i«l,...,m),  Newcarib  says  the  best  result  is  a  weighted  mean  (with  the 
weights  proportional  to  the  probabilities  of  the  hypotheses  on  which  they 
depena)  of  m11  weighted  means  of  the  observations,  each  mean  being  obtained 
by  making  a  hypothesis  concerning  the  distribution  of  the  m  measures  of  pre¬ 
cision  among  the  n  observations. 

Edgeworth  (1887a) ,  in  discussing  the  diversity  of  methods  for  the  treat¬ 
ment  of  discordant  observations,  makes  the  following  statement  (p.  365): 
"Different  methods  are  adapted  to  different  hypotheses  about  the  cause  of  a 
discordant  observation;  and  different  hypotheses  are  true,  or  appropriate, 
according  as  the  subject-matter,  or  the  degree  of  accuracy  required,  is 
different."  He  specifies  three  hypotheses  and  divides  the  different  methods 
of  treating  discordant  observations  into  four  .groups.  He  rates  the  first 
three  types  of  method  on  their  appropriateness  under  each  of  the  hypotheses, 
deferring  discussion  of  the  fourth  method  (use  of  the  median  instead  of  the 
arithmetic  mean)  to  a  later  paper  [Edgeworth  (1887d)]. 

Edgeworth  (1887b)  drops  Boscovich's  Condition  (I)  [that  the  sums  of 
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positive  and  negative  deviatiots  be  equal  in  wgnitade]  and  uses  only  Ms 
Condition  (IX)  [that  the  sot  of  the  absolute  values  of  the  deviations  be  a 
ainiMj,  which,  as  we  have  already  seen,  requires  the  choice  of  the  median 
rather  than  the  arithmetic  naan.  Edgeworth  gives  the  following  description 
(pp.  279-282)  of  his  procedure:  "It  is  proposed  here  to  treat  those  difficul¬ 
ties  in  the  reduction  of  observations  which  are  peculiar  to  the  case  of  plural 
quaesita.  *’*  Consider,  first,  the  simple  case  in  which  there  are  only  two 
'maesita.  Let  the  given  equations  be  of  the  for®  axx+^xy'wl*0,a2x^>2y  ^:*0,^c'  * 
where  w^ ,w? ,  5c. ,  axe  observations  subject  to  equal  error.  According  to  the 
usual  procedure  we  obtain  for  one  locus  (of  the  sought  point  xy)  the 'normal 
equation'  a^ [a^x+b^y-w^  J [a2X+b2y-W2]+  5c. *0;  which  may  be  thus  interpreted. 
Substitute  any  assigned  value  for  y  in  the  original  equations.  Of  the  n  values 
for  x  thus  presented,  the  (weighted)  Arithmetical  Mean  is  given  by  substituting 
the  assigned  value  for  y  in  the  'normal'  equation.  The  analogous  procedure 
is  to  find  a  locus  such  that  if  we  substitute  any  assigned  value  of  y  in  the 
original  equations,  the  Median  of  the  corresponding  n  values  of  x  may  be  given 
by  the  locus.  The  series  of  points,  which  in  the  case  of  the  Arithmetical 
Mean  is  obtained  by  a  single  stroke  of  analysis,  must,  in  the  case  of  the 
Median,be  traced  one  by  one.  That  is,  we  must  substitute  in  the  given  equa¬ 
tions  successive  values  of  y(e.g.  0,6,26,  5c.),  find  the  Median  value  for  x 
corresponding  to  each  assigned  y,  and  plot  the  series  of  points.  A  second 
Median  Curve  is  afforded  by  the  Medians  of  the  y  components;  and  the  inter¬ 
section  of  these  Median  Curves  gives  the  Median  Point.  The  method  is  perfectly 

general.  As  an  illustration  we  may  take  the  case  of  two  quaesita,  x  and  y; 
the  equations  for  which  involve  only  one  of  the  variables.  The  Mean  loci  are 


48 


in  trnc  case  parallel  to  foe  axes.  Jtad  it  {pilots  from  csesadersticBS 
iddch  I  hzrx  elsewhere  [BcgeMorth  (L887d ))  pot  together,  that  tae  Med im, 
as  compared  vita  the  Arithmetical  Mean,  affords  a  solution  nearly  as  good 
when  the  typical  [oontl]  probability-curve  prevails,  ad  better  when  die 
observations  are  *  discordant* The  aether  (p.  230)  calls  the  Bescovidi-Laplace 
method  [Laplace  (1799) ,  Sec.  40]  a  "remarkable  hybrid  between  the  i  method  of 
Least  Squares  and  the  Method  of  Situation" ,  because  Boscovidi’s  Condi tion  (I) 
requires  that  deviatians  be  taken  from  the  arithmetic  mean, as  in  the  method 
of  least  squares,  instead  of  from  the  median,  as  in  Edgeworth's  version  of 
the  method  of  situation,  where  that  condition  has  been  dropped. 

Edgeworth  (1887c)  writes  as  follows  (pp.  222-225)  concerning  foe  method 
developed  in  the  preceding  paper:  'The  method  may  be  foas  described  in  foe 
case  of  two  variables ,  x  and  y.  Find  an  approximate  solution  by  some  rotgh 
process  (such  as  simply  adding  together  several  of  the  equations  so  as  to  form 
two  independent  simultaneojjs  equations) .  Take  the  point  thus  determined  as  a 
new  origin,  and  siistitute  in  foe  n  (transformed)  equations  for  one  of  foe 
variables  x  a  series  of  values  ±  6,±  26,  §c.  Corresponding  to  each  of  these 
substitutions  we  have  n  equations  for  y,  For  each  of  these  systems  determine 
the  Median  according  to  Laplace's  Method  of  Situation.  This  series  of  Medians 
forms  one  locus  for  the  sought  point.  A  second  locus  is  fcind  by  transposing 
x  and  y  in  the  directions  just  given.  The  intersection  of  these  loci  is  the 
required  point.  The  method  may  be  extended  to  any  nunber  of  variables.  ***  The 
advantages  claimed  for  the  new  method  are  that,  while  in  the  typical  case  of 
the  laws  of  facility  being  all  [normal]  Probability  Curves,  the  generalized 
Method  of  Situation  is  only  slightly  less  accurate,  and  considerably  less 
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laborious,  tibaa  tie  Metso d  of  Least  Sgocts;  is  tie  fansal  case 
CfaCTHias  tie  proposed  v-tiud  is  nci  sere  ccamuBt,  ost  better.  It 
is  mud*  to  be  tiat  soe  practical  astronomer  wcM  jiw  this  method  a 

trial  by  employing  it  is  some  laborious  aid  importast  calcolation  .**' 

Edgeworth  (1887d)  considers  tie  questim  of  tie  choice  of  scans  in  the 
special  case  of  discordant  observations.  Cfa  p.  270  he  writes:  ”Ibe  cxitexicn 
idsether  tie  Medial  or  Aritiaetic  Mean  is  tie  better  redaction  is  presumably 
tie  character  of  lie  correlated  Probability-Curve.  lie  reduction  which  corres¬ 
ponds  to  tie  saallcr  Modulus  is  presoubly  tie  better;  since  thus  we  obtain 
a  smaller  'probable*  error  **’  .  lfiich  ef  tie  reductions  will  base  tie  sailer 
Modulus  will  depend  on  the  character  of  our  facility- carte.  For  [noroal] 
Probability-Curves,  and  presoaably  fractions  in  their  neighborhood,  it  is  shown 
by  Laplace  [(1812),  Supplement  2]  that  the  Arithaetic  Mean  has  the  advantage. 
But  for  curves  whose  head  reaches  high,  while  their  extremities  stredi  out 
far,  the  Median  has  the  advantage.  Mow  the  grotping  of  Discordant  Observa¬ 
tions  is  apt  to  assume  this  form.  Accordingly  the  Median  is  proposed  as  the 
Mean  proper  to  this  class  of  observations.  If  we  have  been  deceived  by  the 
appearance  of  Discordance  *  *  *  and  the  facility-curve  was  really  a  normal  Prob¬ 
ability-Curve,  yet  we  shall  have  lost  little  by  taking  the  Median  instead  of 
the  Arithmetic  Mean.  For  the  error  of  the  former  is  of  the  same  order  as 
(only  1.3  [times]  greater  than)  the  error  of  the  latter.  And,  if  the  observa¬ 
tions  are  really  discordant,  the  derangement  uue  to  the  larger  deviations  will 
rot  be  serious,  as  it  is  for  the  Arithmetic  Mean."  The  author  gives  three 
numerical  exarples. 

Edgeworth  (1887e)  gives  two  tests  of  symmetry,  one  based  on  a  comparison 


of  tie  mb  of  tin  poctiie  wwd  ae^rtive  deriatias  fnot  the  dtia  rii: 
ra  oaf  tie  oder  at  w  canxiscn  of  tie  ■ritfrgtiir  seat  ad  tie  median  of 
tie  ohemdcas.  tie  critical  value  widen  ie  files  for  tie  letter  test  is 
enaeas;  since  tie  citiertic  ace  and  tre  nedim  are  correlated,  tie 
variance  of  their  difference  is  nor  tie  sm  of  tieir  variances,  bat  that  sat 
decreased  by  twice  tieir  covariance. 

H.  H.  timer  (107)  connects  as  folium,  (pp-  466-470)  on  tie  method  of 
Edgeworth  (1887b  ,c] :  l^anrtii  invites  attention  tc  a  netiod  of  reducing 
cfcserraticns  relating  to  several  quantities,  vidch  ie  has  suggested  as  a 
substitute  for  tie  ordinary  process  of  the  'Method  of  least  Squares’.  I  have 
applied  tins  netiod  to  an  exaaple  for  a  particular  case  of  two  variables,  and 
venture  to  offer  the  following  renanis  and  scggestiots  for  consideration. 

[Here  follows  a  quotation  free  Edgeworth  (1887c).]  of  the  labour  of  this 
process,  and  sone tines  the  preliminary  search  for  an  approximate  solution,  nay 
be  avoided  by  the  use  of  u  graphical  netiod  [which  the  author  describes] .  * " 

I n  the  netiod  of  least  squares  the  noxnal  equations  have  a  mique  solution; 
but  the  intersection  of  two  broken  lines  nay  be  a  series  of  points,  ®d  the 
two  nediart  loci  nay  also  have  a  cowan  portion.  The  solution  then  becomes  to 
sow©  extent  indeterminate.  *  *  *  It  is  possible  that  the  notoer  and  distribution 
of  these  points  of  intersection  afford  real  information  as  to  the  value  and 
accordance  of  the  observations.  But,  in  practice,  a  single  solution,  althoigfc 
its  singularity  a ay  be  sow?  tat  fictitious,  is  preferable  to  a  variety;  and 
unless  sons  additional  criteiion  for  extracting  a  single  solution  from  the 
median  loci  can  be  obtained,  it  is  to  be  feared  that  we  have  here  a  sonewhat 
serious  objection  to  this  Method  on  -he  scor'  convenience.  ‘“Mr.  Edgeworth 
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e^*rwK.  m  in  lages  for  tie  aar  xethod  tkt  (I)  It  is  eaaadoafe^  less 
lataiag  tia  tie  Method  of  least  Sooces.  (2)Ia  tae  case  cf  gscoriatt 

it  is  tirf^etically  better.  So  far  as  ar  sligst  ageriaoe  en¬ 
tities  ae  to  express  m  optrim  oa  these  points,  I  saodd  say  that  p.)  is  very 
docrtfcuL  la  trying  a  ae*  Method  wxr>  txae  is  liable  to  be  wasted;  bat  these 
weald,  I  ingine,  never  be  qtxte  the  saae  strai^stfonaxflaess  aboat  the  ae* 
aethod  %exich  eases  tie  method  of  least  squares  so  easy,  although  saaewrat  long. 

(Z)  is  sorndut  coartesfealaaoed  by  the  fail  ore  to  give  a  gagae  soluticxr", 
Edgeworth  (18S8)  restates  (p.  184)  tae  method  he  proposed  a  year  earlier 
[Edgeworth  p887b,c)] :  ”A  substitute  *o r  the  Method  of  Least  Squares  has  been 
proposed  by  ae,  based  tgm  the  folio.,  og  prior:' ole.  The  data  being  of  the  fbn* 
a^xtbvy  *"  -  v^^O;  x^L+bjy  "*  -v,»0,  c  c. ,  (where  v, ,v^ »  &c-  are  observations 
of  equal  worth) ,  a  solution  is  obtainable  by  taking  x,y  *  *  *  such  that  the  son 
of  the  residuals  (the  left-hand  Members  of  the  above  written  equations) ,  each 
residual  taken  positively,  should  be  a  wimwua”.  Li  a  footnote  (pp.  184-185) 
he  writes:  This  rule  is  derivable  from  the  hypothesis  that  the  law  of  error, 
the  facility-curve  coder  which  the  observations  range,  is  of  the  for*  y*(h/2)e  , 

x  taken  positively  in  both  directions  [Lap lace’s  first  law  of  error;  see 

Laplace  (1774)].  But  the  use  of  the  rule  does  not  conait  us  to  the  assumption 
of  the  hypothesis.  The  Method  of  Least  Sub  is  in  this  respect  exactly  on  a 
par  with  the  Method  of  Least  Squares.  The  rule  of  the  latter  Method  is— Deter¬ 
mine  x  and  y  so  that  the  sub  of  the  squares  of  the  residuals  may  be  the  least 
possible.  This  rule  is  derivable  from,  and  specially  correlated  with,  the 
hypothesis  that  the  law  of  facility  is  the  [normal]  Probability- curve.  But  it 
is  thought  legitimate  by  Laplace  and  other  eminent  authorities  to  apply  the 
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rale  era  vtsre  tie  ej^otaesis  is  sot  assoed.  So  doubt  tie  use  of  eitiser 
■tied  divorced  fro*  the  lar  of  facility  appropriate  to  it  is  ope&  to  logical 
objections.  But  trs  difficulties  »re  aot  greater  far  one  method  that  for  tie 
other.”  The  author  crnti=mrs  (pp.  13S-186) :  ”3*  paint  tits  designated  anst 
be  an  each  of  two,  or  acre,  loci  asaicgocs  to  the  Hanoi  equations  of  the 
ordinary  netfeod.  Accordingly  the  inter  section  cf  the  'Median  loci*  teas  at 
first  proposed  by  me  as  the  sc-luticn.  Set  Mr.  Turner  [(18b7)]  has  shewn  that 
these  lod  are  apt  to  have  in  cannon,  rat:  oil)'  several  paints,  bat  even  lines 
and  spaces.  ***  In  this  evert  cannon  scsse  teaches  us  that  we  should  adopt 
the  middle  of  the  indeterminate  tract  as  tire  best  point;  and  this  presunption 
is  confiraed  by  a  fonsal  calculation  of  utility  such  as  Laplace  [  (J  312)  ] ,  in 
the  simplest  case  of  a  single  unknown  quantity,  has  espioyed  to  discover  the 
*nost  advantageous  point5."  Having  thus  disposed  cf  one  of  Turner’s  criticises, 
Edgeworth  endeavors  (scssesdiat  less  successfully,  it  seems  to  the  present  writer) 
to  answer  the  other,  naenely  that  the  method  is  not  less  laborious  than  the 
method  of  least  squares,  as  rdgeworth  [ (1887b, c)]  had  asserted. 

Joseph  Bertrand  (1888a)  states  that  the  Gaussian  law  of  probability  is 
the  only  one  for  which,  among  several  observations  made  under  the  sane  condi¬ 
tions,  the  mean  value  is  the  most  probable.  Given  any  other  law  for  the 
probability  of  errors.it  is  possible  to  specify  the  combination  of  a  series 
of  measurements  which  will  give  the  most  probable  value.  The  converse  is  not 
true;  given  a  confcination  of  observations,  in  most  cases  there  is  no  probability 
law  for  which  that  combination  gives  the  most  probable  value.  For  exanple,he  says, 
there  is  no  probability  law  for  which  the  geometric  mean  or  the  harmonic  mean 
of  a  nunber  of  observations  is  the  most  probable  ,ralue.  The  latter  part  of 
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the  paper  deals  vith  the  rejectica  of  cfeemtiaES ,  for  wMds  BextraEal 
(18S2b)  proposes  a  dw  criteria.  The  sitbor  repeats  sad  elaborates  ae  these 
results  is  bis  bock  [Bertrasd  (1S^)}. 

Fare  (1883)  points  cot  that  the  greatest  discrepancy'  from  the  neaa  of 
3  set  of  observances  is  aot  likely  to  be  cctnterbalasvced  by  a  discrepancy 
of  nearly  the  sane  magnitude  bat  of  opposite  sign,  and  hence  unless  the  mmfcer 
of  observations  in  the  set  is  very  l  arge  it  is  likely  to  hare  an  undue 
influence  on  the  arithmetic  mean,  so  that  the  arithmetic  mean  of  all  the 
observations  is  not  the  nost  probable  value.  Given  forty  observations ,  he 
computes  the  arithmetic  means  of  the  largest  and  smallest  observations,  the 
second  largest  and  second  smallest,  and  so  on  (the  midrange  and  the  quasi- 
mdranges).  The  former  has  the  value  4.315  and  the  latter  range  free*  3.815 
to  3.995.  He  attributes  this  discrepancy  to  the  fact  that  the  largest  devia¬ 
tion  from  the  arithmetic  mean,  6.35-3.93  =  2.42,  is  not  matched  by  a  comparable 
deviation  in  the  opposite  direction.  He  therefore  reccranends  rejecting  the 
largest  observation,  which  changes  the  arithmetic  mean  from  3.93  to  3,87,  a 
value  which  he  considers  to  be  more  probable.  In  general, however  he  holds 
that  observations  should  be  rejected  only  if  they  are  considered  doiirtful 
at  the  time  they  are  made,  or  at  least  before  any  amputations  have  been 
made.  Otherwise  the  calculator  can  too  easily  make  the  results  agree  with 
his  preconceived  and  sometimes  erroneous  opinion. 

Gal  ton  (1888,1889)  advocates  the  vse  of  the  median  as  a  measure  of  cen¬ 
tral  tendency  and  the  quartile  deviation  as  a  measure  of  dispersion.  In  the 

latter  publication,  he  writes:  "The  median,  M,  has  three  properties.  The 
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first  follows  i—nfi  itrTj  rrtx  its  construction,  n anet/,  that  the  chmct  is 
a  equal  one,  of  seer  previously  unknown  measure  in  the  grtxp  exoeediag  or 
ftlling  short  of  M.  The  second  is,  that  the  nost  probable  value  of  any 
previously  rafaynn  measure  in  the  group  is  M-  *  *  *  The  third  property  is  that 
yteeier  the  curve  of  the  Schene  {of  distribution]  is  syaaetrically  disposed 
on  eithei  side  of  M  *  *  * ,  then  M  is  identical  with  the  ordinary  Arithmetic 
Mem  or  Average."  (?.  41).  "As  the  M[nedian]  measures  the  Average  Height  of 
the  curved  bctndary  of  a  Schene,  so  the  Q[quarti2e  deviation]  measures  its 
general  slope.  *  *  *  Our  Q  has  the  farther  aerit  of  being  practically  the  sane 
as  the  value  which  mathematicians  call  iL?  'Probable  Error’  ***«"  (p.53) . 

Eeanuel  Czuber  (1890)  advocates  the  methoa  '‘f  marina  likelihood  to  find 
the  most  probable  system  of  values  of  si  unknown  e  lessen  Cs  p,q,r,  in  the  law 
of  error  ♦  W  of  specified  form,  given  that  ,r^  are  the  errors  of  n 

observations.  He  attributes  this  method  to  Gauss  (1809) ,  but  we  neve  already 
seen  that  it  was  used  even  earlier  by  Lambert  (1760)  and  by  D.  Bernoulli 
(1778) .  The  author  enumerates  three  conditions  uider  which  the  usual  method 
of  finding  the  maximum  likelihood  estimates,  based  on  solving  likelihood 
equations  formed  by  equating  to  zero  the  partial  derivatives  of  the  likelinood 
fuicfnn  ft»*(x1J4(x2) ’ ,  fails. 

J.  E.  Estienae  (1890)  endeavors  to  prove  that  "the  best  value  to  adopt.-  as 
measure  of  a  quantity  of  which  experience  has  furnished  values  tainted  by 
acci iental  errors,  is,  in  every  case,  the  median  value"*."  In  Chapter  I,  he 
proves  the  following  theorem  (p.  241):  '-The  most  probable  value,  determined 
by  the  rule  of  the  median  \ralue,  is  that  for  which  tlie  arithmetic  sum  of  the 
dev:  ations  is  a  minimun."  Chapter  II  is  devoted  to  the  proposition  that  the 
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role  of  the  Median  vrlue  is  iadeperAat  of  the  law  of  errors  and  should  be 
applied  to  the  exclusion  of  every  other  role,  whatever  be  this  1 Chapter 
III  deals  with  the  consequences  of  the  law  of  the  Median  value,  of  which  the 
author  gives  several,  including  the  use  of  the  Method  of  least  first  powers 
in  solving  n  inconsistent  equations  in  a  reknowes  (ips) .  An  exapis  dealing 
with  artillery  fire  is  given  in  the  appendix.  The  author  is,  of  course,  in¬ 
correct  in  declaring  the  traversal  superiority  of  the  Median,  but  no  rare  so 
than  earlier  authors  who  insisted  on  that  of  the  arithmetic  Mean. 

Czuber  (1891a)  gives  a  detailed  study  of  the  theory  of  linear  observa¬ 
tional  errors,  the  Method  of  least  squares,  and  the  theory  of  errors  in  the 
plane  and  in  space.  The  first  six  sections  of  Part  1  deal  with  laws  of  error 
[with  particular  ejqhasis  on  the  normal  (Gaussian)  law,  though  Laplace's 
first  law  and  others  are  Mentioned];  the  early  work  of  Simpson  (1756,1757) 
and  Lagrange  (1774)  on  the  advantages  of  taking  averages,  of  Laplace  (1774, 
1781)  on  his  "most  advantageous  Method",  and  of  Daniel  Bernoulli  (1778)  on 
the  method  of  Maximus  likelihood;  the  work  of  Legendre  (1805) ,  Adrain  (1808) , 
Gauss  (180S) ,  Laplace  (1812) ,  and  later  writers  on  the  method  of  least  squares ; 
and  the  problem  of  the  choice  of  means  and  its  relation  to  the  choice  between 
the  method  of  least  squares  and  rival  methods.  Sections  7  and  8  deal  with 
estimation  of  the  precision  of  a  series  of  observations  from  the  true  errors 
and  the  apparent  errors  (residuals) ,  respectively.  Section  9  compares  the 
(normal)  error  law  with  experience.  Section  10  deals  with  the  smallest  and 
largest  errors  in  a  set  of  observations,  and  Section  11  deals  with  the  treat¬ 
ment  of  outlying  observations.  Part  2  is  concerned  with  the  details  of  the 
computational  procedure  for  the  method  of  least  squares,  and  Part  3  deals  with 
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the  theory  o5  errors  in  the  plane  and  in  space. 

Czubor  0891b)  examines  the  rule  proposed  by  Estienns  (1890)  and  its 
consequences,  and  shews  that  the  Median  and  the  Method  of  least  absolute  first 
powers,  far  from  being  valid  whatever  the  law  of  error,  as  asserted  by  Estienne, 
are,  as  pointed  out  by  Glaisher  C1872) ,  tied  just  as  firmly  to  the  first  law 
of  error  of  Laplace  (1774)  as  are  the  arithmetic  mean  and  the  method  of  least 
squares  to  Laplace’s  second  (Gauss’)  law. 

P.  Pizzetti  (1892)  gives  a  useful  suamary  of  wife  to  date  ca  the  theory 
of  errors,  with  a  bibliography  of  S03  items,  but  presents  few  if  any  new  results . 

Edgeworth  (1893) ,  in  connection  with  a  study  of  averages  of  correlated 
observations,  proposes  two  principles  of  wider  application:  "(l)When  observa¬ 
tions  are  combined  according  to  a  system  of  weights  different  from  that  wiich 
is  known  to  be  best,  it  is  in  general  advantageous  to  reject  a  certain  class 
of  the  given  observations.  (2)  When,  as  usual,  the  observations  range  wider 
a  [normal]  probability  curve,  the  median  m  corrected  by  the  quartiles  and 
q_2  affords  a  formula  for  the  Mean,  viz.  (1.2m  + 

accurate  than  that  method  of  combining  such  observations  which  has  hitherto 
been  supposed  to  be  the  most  accurate,  viz.  the  Arithmetic  Mean.  The  principle 
may  be  applied  with  great  ease  and  advantage  to  Discordant  Observations."  The 
author's  statement  that  his  new  formula  gives  a  result  more  accurate  than  the 
arithmetic  mean  for  observations  from  a  normal  error  law  is  incorrect.  The 
source  of  his  error  lies  in  the  assumption  that  the  quartiles  are  independent 
of  each  other  and  of  the  median,  whereas  they  are  actually  considerably  corre¬ 
lated,  as  Karl  Pearson  (1920)  has  pointed  out. 

$ 

P.  J.  Ed.  Goedseels  and  Paul  Mansion  (1893)  discuss  the  theory  of  errors , 


q^-  q7)  *3.2,  which  is  more 
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Goedseel*  observes  that  in  reality  no  one  has  ever  established  the  method  of 
least  squares  in  an  absolutely  conclusive  manner,  and  that  one  should  avoid 
assigning  too  great  objective  value  to  the  results  to  which  it  leads.  Mansion 
is  of  the  same  opinion.  In  reality,  he  says,  the  only  definition  which  ops 
can  give  of  accidental  errors  is  this:  They  are  the  errors  which  are  elimi¬ 
nated  by  the  method  of  least  squares.  But  it  is  just  to  recall  that  Gauss  took 
care  to  express  himself  with  precision  cm  what  is  arbitrary  in  the  theory  to 
which  he  gave  such  a  perfect  form.  The  great  advantage  of  the  method  of  least 
squares  is  that  it  allows  the  combjiistion,  in  a  simple  and  reasonable  manner, 
of  the  results  of  observations  of  unequal  value  in  a  condensed  form.  This 
discussion  is  of  interest  because  the  budding  dissatisfaction  with  the  method 
of  least  squares  expressed  here  later  led  to  substantial  contributions  by  the 
authors  to  the  theory  cf  rival  methods. 

Karl  Pearson  (1895) ,  in  connection  with  the  first  exposition  of  the 
Pearson  system  of  frequency  curves,  discusses  the  relative  position  of  mean, 
median,  and  mode  of  sar  >s  from  a  Pearson  Type  III  distribution.  He  finds  ti*at 
the  median  lies  about  one  third  of  the  way  from  the  mean  to  the  mode, 

Henri  Poincare  (1896)  discusses  the  theory  of  errors  and  the  arithmetic 
mean,  justification  of  the  law  cf  Gauss,  errors  in  the  position  of  a  point,  and 
the  method  of  least  squares.  He  expresses  doubt  as  to  the  universal  validity 
of  the  Gaussian  law  of  error  (and  with  it  the  arithmetic  mean  and  the  method 
of  least  squares), but  offers  no  specific  alternatives.  Twice  he  raises  the 
question  as  to  whether  one  should  reject  outlying  observations,  but  offers  no 
definitive  answer. 

Fechner  (1897)  discusses  various  laws  of  error  and  various  averages 
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{arithmetic  mean,  median,  and  mode) ,  as  well  as  the  largest  and  smallest 
values,  tikdr  difference  (the  range)  and  their  sun  (twice  the  midrange),  and 
ccpares  theoretical  and  observed  values  for  several  of  these  statistics. 

Czuber  (1899)  applies  the  theory  of  probability  to  the  results  of  measure¬ 
ments.  He  discusses  the  arithmetic  mean  and  other  averages,  including  the 
aedLsn  and  the  power  means  [Fechner  (1874)] ,  and  the  estimation  of  the  pre¬ 
cision  of  a  series  of  observations  on  the  basis  of  the  true  errors.  He  mentions 
tiie  meth  of  situation  of  Laplace  (1793,1799),  based  on  the  two  conditions  of 
Boscovich,  as  the  first  attempt  at  a  systematic  solution  of  the  problem  of 
solving  an  inconsistent  system  of  observational  equations.  He  devotes  several 


sections  to  demonstrations  of  the  meuiod  of  least  squares  based  on  the  work 
of  Gauss  (1809) ,  Laplace  (1812) ,  and  Causs  (1823)  and  related  material.  He 
also  discusses  measures  of  precision  based  on  apparent  errors  and  differences 
of  observations,  the  comparison  of  theoretical  and  observed  distributions, 
the  largest  and  smallest  errors  in  a  series  of  observations,  and  the  treatment 
of  outlying  observations. 

Goedseels  (1900)  mentions  three  methods  of  solving  a  set  of  simultaneous 
equations  greater  in  nuiber  than  the  number  of  unknowns--the  method  of  Tobie 
Mayer  (1750) ,  the  method  of  least  squares ,  and  the  method  of  Cauchy  (1837) . 

He  makes  a  detailed  study  of  the  method  of  Mayer  and  establishes  the  four 
following  prepositions :  (1)  The  method  of  Mayer  used  in  our  days  differs 
appreciably  from  the  original  method;  (2)  The  modern  method  is  susceptible  to 
an  inportant  simplification;  (3)  There  is  room  for  returning,  in  certain  cases, 
to  the  primitive  procedure;  (4)  Both  the  primitive  and  modern  procedures  offer 
certain  advantages  not  previously  pointed  out  (especially  in  Jre  care  of  only 
two  unknowns). 
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A.  A.  Markoff  (1900)  devotes  a  chapter  of  his  book  on  the  calculus  of 
probabilities  to  the  method  of  least  .squares* 

Goedseels  (1901)  proposes  a  simplification  of  the  method  of  Cauchy  (1837) 
for  solving  a  system  of  m  linear  equations  in  n  unknowns ,  m>n.  Goedseels  (1902) 
proposes  an  application  of  Cauchy's  method  to  least  squares. 

Czuber  (1903)  discusses  the  theory  of  errovs  of  observation,  including  Jaws 
of  error,  various  measures  of  precision  (including  the  mean  absolute  error,  the 

fit 

square  root  of  the  mean  square  error,  the  probable  error,  the  m—  root  of  the 
mean  of  the  m—  powers  of  the  absolute  value  of  the  error,  and  the  mean  differ¬ 
ence  of  all  pairs  of  observations) ,  and  the  method  of  least  squares. 

J.  C.  Kapteyn  (1903)  develops  the  theory  of  skew  frequency  curves  fran  a 
different  point  of  view  than  that  of  Pearson  (1895) .  He  discusses  the  median 
and  the  method  of  calculating  it  for  his  theoretical  curves,  and  compares  its 
values  in  several  examples  with  those  of  the  mode  and  the  arithmetric  mean. 

S.  A.  Saunder  (1903)  advocates  the  use  of  Peirce's  criterion  for  the 
rejection  of  doubtful  observations,  for  which  he  gives  a  new  table  of  critical 
values.  He  also  discusses  the  alternative  to  rejection  of  observations  pro¬ 
posed  by  DeMorgan  (1847)  and  Glaisher  (1873) . 

Mansion  (1906)  summarizes  important  contributions  to  the  history  and 
the  critique  of  the  method  of  least  squares  contained  in  extracts  of  letters 
and  papers  of  Gauss  in  the  eighth  volume  of  Gauss 4  collected  works  (published 
in  1900) .  He  makes  brief  mention  of  the  theory  of  combination  of  observations , 
introduced  by  Laplace  (1786)  [see  also  Laplace  (1793,1799)],  in  which  the 
largest  error  (in  absolute  value)  is  smaller  than  for  any  other  system.  He 
points  out  that  Gauss  (1809)  criticized  this  method  on  the  grounds  that  it 
uses  for  the  final  calculation  of  the  unknowns  only  a  number  of  equations 
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equal  to  Hie  lumber  of  unknowns;  the  other  equations  are  used  only  to  decide 
the  choice  which  one  should  make. 


Goedseels  (1909)  proposes  two  methods,  which  he  calls  the  most  approxi¬ 
mative  method  and  the  method  of  minimum  approximation,  for  solving  a  system 
of  n  equations  in  p  unknowns  (n>p) .  In  the  most  approximative  method  one 
assumes  that  the  intervals  (A^,B/)  containing  the  respective  residual  errors 
r^  are  knc%n,  and  seeks  to  determine  for  each  unknown  (say  x)  the  smallest 
interval  (T,S)  containing  that  unknown,  i.e.  an  interval  such  that  for  every 
value  of  x  less  than  I  or  greater  than  S,  one  or  more  of  the  residues  r^ 
lie  outside  the  given  interval.  If  all  observations  are  equally  trustworthy 
and  if  positive  and  negative  errors  are  equally  likely,  then  *  *  *  « 

A^=  A,  B-j=  1*2=  ***  =  Bn=  B,  and  -A=B=M  (say) .  Consider  a  series  of  equations 
in  a  single  unknown  of  the  form  x*m,  having  the  same  approximation.  M,  and 
suppose  the  equations  are  arranged  in  order  of  increasing  values  of  m:  x=m^+ 
ip  x  =  m2+  r2,  x  =  11^+  rn-  Then  the  most  approximative  value  is  the 
midrange,  (m^+  mn)/2,  and  the  approximation  of  this  value  is  the  difference, 
M'(®n'ml)/2»  between  the  given  approximation,  M,  and  the  semirange, (m^-m^/2. 
When  M-(mft-mj)/230,  the  midrange  (ra^+  mh)/2  is  the  exact  value  of  the  unknown. 
Mien  M-(mn-m1)/2<0,  the  data  are  absurd.  When  the  limits  of  the  errors  are 
not  known,  Goedseels  advocates  use  of  the  method  of  minimum  approximation, 
which  is  the  same  as  the  method  of  Laplace  (1786,1793,1799)  in  vMch  the 
maximum  absolute  deviation  (residual)  is  minimized;  in  the  case  of  a  single 
unknown,  the  result  (average)  obtained  is  the  midrange.  Goedseels  discusses 
various  other  methods,  including  the  method  of  least  squares  and  the  empiri¬ 
cal  methods  of  Mayer  (1750)  and  Cauchy  (1837).  In  sunmary,  he  states  that 
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he  prefers  the  roost  approximative  method  when  the  limits  of  error  are  known 
and  the  mininam  approximation  otherwise,  especially  in  very  important 
questions,  even  though  the  calculations  become  quite  laborious  for  p>2;  in 
questions  of  lesser  importance,  he  says  the  method  of  least  squares  or  even 
one  of  the  empirical  methods  may  be  used  to  save  labor. 

Charles  J.  de  la  Vallee  Poussin  (1909,1911)  states  and  proves  the 
following  theorems  concerning  the  minimum  approximation,  which  assuse  that 
any  n  of  the  first  members  of  the  system  of  equations  (1)  a-x  +b-y+”*+£jU 
-m^  *  0(i*=l,2,”  *  *  ,m;  m>n)  form  a  system  of  linearly  independent  expressions: 

I.  The  values  of  the mknowns  which  provide  the  mininam  approximation  M  of 
the  system  (1)  give  at  least  n+1  residues  attaining  this  limit  M  in  absolute 
value.  II.  The  nurr™™  approximation  and  the  corresponding  values  of  the 
unknowns ,  for  a  system  of  n+1  equations  in  n  unknowns ,  are  obtained  by  general 
formulas.  III.  If  m>n+l,  the  minimum  approximation  of  a  system  of  m  equations 
in  n  unknowns  is  that  of  a  certain  system  of  n+1  equations  which  are  part  of 
the  proposed  system.  One  can  deduce  from  these  theorems  an  iterative  pro¬ 
cedure  for  determining  the  minimum  approximation. 

Goedseels  (1910)  summarizes  the  results  given  in  his  book  [Goedseels 
(1909)  j  on  the  most  approximate  method,  tie  method  of  miniman  approximation, 
and  the  method  of  least  squares,  and  applies  all  three  methods  to  a  numerical 
example  involving  the  compensation  of  the  coordinates  of  the  vertices  in 
a  topographic  survey.  He  insists  that  the  method  of  least  squares  is  not 
as  good  as  the  other  two  methods,  even  in  the  case  of  normally  distributed 
errors,  for  which  it  gives  the  most  probable  values,  since  these  values  are 
inadmissible  if  they  lie  outside  the  interval  (I,S)  of  the  most  approximative 


nethod.  In  the  numerical  exaaple,  he  first  ascertains  the  the  aaxjam  error 
e  specified  by  the  observer  is  achissible,  i.e.  that  it  is  not  less  than 
the  ainiima  approximation  a.  Then  he  proceeds  to  compensate  the  z-coordi- 
nates  of  the  data  points  by  the  most  approximative  method  and  by  the  method 
of  least  squares.  He  rejects  the  results  of  the  latter  method  as  inadmissible, 
since  the  z- coordinate  it  gives  for  one  point  lies  outside  the  interval  given 
by  the  most  approximative  method.  Finally,  he  ignores  the  maximum  error 
stated  by  the  observer  and  compensates  the  z- coordinates  by  the  method  of 
minimum  approximation.  Hie  resulting  values  lie  within  the  intervals  given 
by  the  most  approximative  method,  and  the  author  shows  that  this  must  always 
be  true  if  any  admissible  value  e^M  (where  M  is  the  minimum  approximation) 
is  designated  as  e  by  the  observer. 

Goedseels  (1911)  calls  attention  to  an  interesting  study  of  the  method 
of  irtinimun  approximation ,  including  a  simplification  of  that  method,  proposed 
by  de  laVallee  Poussin  (1909,1911).  He  proposes,  in  turn,  two  other  simpli¬ 
fications  applicable  both  to  that  method  and  to  the  most  approximative  method. 
These  simplifications  apply  only  when  there  are  only  one  or  two  vviknowns , 
of  which  at  least  one  is  positive.  The  latter  is  really  no  restriction  at 
all,  since  the  data  can  be  transformed  so  that  at  least  one  unknown  is  positive. 
The  restriction  to  one  or  two  unknowns  is  less  serious  than  one  might  think, 
since  one  always  proceeds  by  successive  elimination  of  the  unknowns,  and  the 
nunber  of  them  is  always  eventually  less  than  three;  moreover  it  is  in  the 
final  stages,  where  this  condition  is  satisfied,  that  the  methods  in  question 
become  most  complicated.  In  an  abstract  reprinted  as  a  footnote  cm  pp.  351- 
352,  de  la  valine  Poussin  states  that,  with  the  simplifications  proposed  by 
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hiaself  and  by  Goedseels ,  the  calculations  for  the  method  of  niniauw  approxi¬ 
mation  are  even  simpler  than  those  for  the  Method  cf  least  squares. 


G.  iidny  Yule  (1911) ,  in  a  textbook  that  has  gone  through  many  editions , 
discusses  averages  (including  the  median  (Mi)  and  its  relation  to  the  arith¬ 
metic  mean  (M)  and  the  mode  (Mo)] ,  measures  of  dispersion  [the  range,  the 
mean  deviation  about  the  median,  and  the  quartile  deviation  Q*(Qj-Q^)/2,  where 
Qj  and  Qj  are  the  quartiles] ,  and  measures  of  skewness  [ (M-?fc>) /’S  *  3(M-Me)/S 
(Pearson’s  measure,  where  S  is  the  standard  deviation)  and  (Q^  Q3-  2Mi)/2Q]. 

He  also  discusses  the  standard  errors  of  quantiles  (median,  quartiles,  deciles, 
etc.)  and  of  the  semi-interquartile  range  (quartile  deviation  Q),  as  well  as 
the  correlation  between  the  errors  in  two  quantiles. 

L.  Tits  (1912)  states  and  proves  three  theorems  which  embody  further 
simplifications ,  beyond  those  of  de  la  Vallee  Poussin  (1911)  and  Goedseels 
(1911)  ,of  the  most  approximative  method  and  the  method  of  miniimvn  approxi¬ 
mation.  He  uses  these  theorems  to  simplify  the  solution  of  a  mmerical 
example  given  by  Goedseels  (1911) . 

Edward  Lewis  Dodd  (1913)  points  out  that  Czuber  (1891)  recounted  many 
of  the  attempts  that  have  been  made  to  relate  the  principle  of  the  aritiunetic 
mean  as  the  most  probable  value  with  the  Gaussian  probability  law,  but  quoted 
from  Bertrand  (1889),  p.  180,  an  example  to  shew  that  this  law  and  principle 
are  not  strictly  compatible.  The  author  endeavors  to  show  this  incompatibility 
by  other  means.  Specifically,  he  shows  that,  under  certain  conditions,  the 
quadratic  mean  (root -mean -square)  of  two  measurements  from  a  Gaussian  distribu¬ 
tion  has  a  greater  probability  than  the  arithmetic  mean;  also,  there  exist 
positive  values  of  b  (less  than  unity)  for  which  the  probability  cf  bm  is 


greater  than  that  of  m  (the  arithmetic  mean).  The  probability  of  the  median 
of  three  or  more  measurements  from  a  Gaussian  distribution  is,  however, 
always  less  than  that  of  the  arith*  tic  mean, the  ratio  of  probabilities 
approaching  Slfa  =  .7979  asynptotically. 

Edgeworth  (1913)  gives  enpirical  confirmation  of  Pearson’s  rule  as  to  the 
relation  between  the  arithmetic  mean,  the  median,  and  the  mode.  He  computes 
median,  quartiles,  deciles,  and  mean  deviations  for  sums  of  25  and  of  16  random 
digivs,  and  awnares  than  with  the  theoretical  values. 

H.  M.  Goodwin  (1913)  discusses  the  rejection  of  observations,  for  which  he 

gives  a  new  criterion,  which  involves  computing  the  arithmetic  mean  and  the 
average  deviation,  omitting  the  doubtful  observation,  and  rejecting  that 
observation  if  its  deviation  from  the  mean  is  greater  than  or  equal  to  four 
times  the  average  deviation. 

Mansion  (1913)  summarizes  the  work  of  various  authors  on  three  methods 
applied  to  the  theory  of  errors,  which  involve  minimizing  respectively  the 
largest  error,  the  sum  of  the  absolute  values  of  the  errors,  and  the  sum  of 
squares  of  the  errors.  He  points  out  that  these  methods  result  in  choice  of 
the  midrange,  the  median,  and  the  arithmetic  mean  as  the  respective  averages; 
also  that  an  order-preserving  (or  order-reversing)  transformation  does  not 
affect  the  method  of  minimum  sum  of  absolute  values,  since  the  median  of  the 
transformed  function  is  the  transforming  function  of  the  median  of  the  original 
data,  but  does  affect  the  other  two  methods. 

David  Brunt  (1917)  discusses  the  law  nf  error,  making  reference  to  the 
work  of  Gauss,  Todhunter,  and  Glaisher.  He  deduces  the  law  of  error  [Laplace’s 
first]  which  results  from  the  assumption  that  the  median  is  the  most  probable 
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value.  He  also  deals  with  the  rejection  of  observations.  He  appears  to 
favor  use  of  the  criterion  of  Wright  §  Hayford  (1906) ,  but  he  also  states 
that-  Bessel  opposed  rejection  of  any  observation  imless  the  observer  is 
satisfied  that  the  external  conditions  produced  sane  tuusual  source  of  error. 

Warren  H.  Persons  (1919)  gives  reasons  for  preferring  the  median  to  any 
other  average  of  link  relatives  in  confuting  indices  of  seasonal  variation. 

P.  J.  Daniell  (1920)  discusses  various  measures  of  central  tendency  and  of 
dispersion.  Besides  the  usual  arithmetic  mean,  median,  standard  deviation , 
mean  numerical  deviation,  and  quartile  deviation,  he  also  discusses  discard 
averages  and  discard  deviations.  Since  the  middle  observations  (in  order  of 
magnitude)  contain  most  of  the  information  about  central  tendency  and  the 
extreme  observations  contain  most  of  the  information  about  dispersion,  he 
proposes  discarding  some  proportion  (say  501)  of  the  outer  observations  in 
computing  averages  and  of  the  inner  observations  in  computing  measures  of 
dispersion. 

Karl  Pearson  (1920)  points  out  that  Edgeworth  (1893)  was  in  error  when 
he  stated  (p.  99,  footnote)  that  the  displacements  of  the  two  quartiles  and 
the  median  are  independent;  they  are,  in  fact,  considerably  correlated.  The 
author  determines  the  standard  errors  of  quantiles  of  a  sanple  of  size  N 
(assumed  large)  from  any  known  population  and  the  correlations  of  pairs  of 
such  quantiles. 

R.  M.  Stewart  (1920a)  raises  two  objections  to  Peirce's  criterion  for 
the  rejection  of  doubtful  observations.  First  of  i.!l,  the  principle  as  stated 

by  Peirce  is  erroneous  when  n  (the  number  of  observations  to  be  rejected)is 
greater  than  unity.  Even  for  the  case  n»l,  Peirce's  argument  is  based  on  an 


uy arranged  assuptioc,  so  that  the  west  one  can  say  is  that  if  all  residuals 
are  less  than  the  value  obtained  fro®  Peirce's  criterion,  no  observation 
should  be  discarded.  The  author  remarks  that  Giauvenet's  criterion  for  re¬ 
jection  of  a  single  observation  also  contains  an  obvious  fallacy.  Stewart 
(1920b)  proposes  a  new  method  for  the  treatment  of  discordant  observations. 

Like  Stone  (1873b) ,  Edgeworth  (1883b) ,  and  Newcoeb  (1886)  ,  he  assumes  that  the 
precision  is  not  the  same  for  all  the  observations,  but  he  simplifies  matters  as 
much  as  possible  by  restricting  the  nunber  of  values  of  the  precision  constant 
h  to  two.  He  proposes  a  weighted  mean  which  is  a  function  of  the  residuals. 

He  gives  a  method  of  finding  the  weights;  ernes  that  has  bean  done,  the  weighted 
mean  can  easily  be  found. 

IXsiham  Jackson  (1921)  shows  that,  for  each  value  of  p>l,  there  is  a  definite 
nunber  x-Xp  which  minimizes  the  sun  Sp(x)  *  where  a1,a2,‘“an  are 

a  set  of  real  numbers.  For  p*2,  is  the  arithmetic  mean  of  the  a's.  The 
limit  of  xp,  as  p+1,  is  the  median,  while  the  limit  of  x^,  as  p-+«,  is  the  mid¬ 
range. 

Dodd  (1922)  studies  the  arithmetic  mean,  the  median,  the  midrange,  and 
other  functions  (averages)  of  the  measurements  with  reference  to  their  approx¬ 
imation  to  the  so-called  true  value,  by  determining  which  has  the  greatest 
probability  density  at  the  true  value.  He  summarizes  his  results  as  follows 
(p.158):  "The  present  examination  of  functions  of  measurements  is  not  exhaus¬ 
tive.  The  general  conclusion,  however,  would  probably  be  that  in  most  cases 
where  the  law  of  error  is  symmetrical  (the  error  function  even)  the  arithmetic 
mean  is  better  than  other  functions.  The  superiority  of  the  arithmetic  mean 
becomes  somewhat  doubtful:  (1)  When  the  number  of  measurements  is  small.  *** 
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(2)  When  the  probable  error  is  not  snail  compared  with  the  arithmetic  mean. 

“*  (3)  tfjen  the  error  curve, --as  evidenced  by  the  actual  distribution  of 
■easure*cnts'~falls  away  from  its  maxiii  with  sense  rapidity  at  first,  but 
nevertheless  persists  at  some  distance  from  the  mariner..  In  this  case,  the 
median  may  be  better  than  the  arithmetic  mean.  "*  (4)  When  the  error  curve 
is  perpendicular  to  the  axis  of  errors,  meeting  this  axis  at  equal  distances 
from  the  origin.  In  this  case,  the  average  of  the  least  and  greatest  measure- 
ments  [the  midrange]  may  be  better  than  the  average  of  all  the  measurements 
[the  arithmetic  mean] .  *  *  *  " 

Ranald  Aylmer  Fisher  (1922)  advocates  the  use  of  the  sample  median  in 
estimating  the  central  value  of  the  Cauchy  distribution,  pointing  out  that  the 
distribution  of  the  sample  mean  is  the  same  as  that  of  a  single  observation, 
so  that  the  sample  mean  is  an  entirely  useless  statistic.  He  also  touches  on 
the  treatment  of  outlying  observation:),  and  lays  a  firm  mathematical  foundation 
for  the  method  of  maximum  likelihood,  here  first  given  that  name. 

W.  L.  Crum  (1923) ,  in  determining  the  indexes  of  seasonal  variation  in 
an  economic  series,  takes  the  median  of  a  series  of  link- relatives  for  a 
particular  month  as  the  unadjusted  ind  x  for  that  month,  as  recommended  by 
Persons  (1919).  To  the  observed  data  he  fits  (1)  a  normal  curve,  (2)  a 
Charlier  Type  A  curve,  and  (3)  a  composition  of  two  normal  curves.  He  notes 
that  Yule  (1911)  has  shown  that  if  a  distribution  may  be  dissected  into  two 
normal  distributions  each  of  half  the  original  frequency,  and  if  the  ratio 
between  the  two  standard  deviations  is  greater  than  2.24,  the  median  has  a 
smaller  probable  error  than  the  mean.  He  endeavors  to  extend  this  result 
to  the  case  under  consideration,  in  which  the  standard  deviation  of  the  larger 
portion,  con tabling  about  3/4  of  the  total  number  of  cases,  is  4.8  times  that 


of  the  scalier  portion.  He  concludes  that  for  this  series  and  for  many  (hut 
not  all)  economic  series ,  the  median  is  better  than  the  mean. 

Edgeworth  (1923)  restates  the  rationale  of  the  method  which.  he  proposed 
much  earlier  [Edgeworth  (1887b, c, 1888)]  for  solving  a  redundant  system  of 
equations  and  amplifies  the  directions  for  its  application  given  by  Turner 
(1887).  He  then  considers  several  mrcrical  exan?lesj  and  closes  (pp.  1085- 
1088)  with  the  following  conclusions:  'The  accuracy  of  the  double  Median 
depends  on  much  the  same  considerations  as  those  which  relate  to  the  single 
Median.  *  *  *  The  comparison  of  the  (single  or  double)  Median  method  with  that 
of  Least  Squares  is  prejudiced  by  two  misapprehensions  exaggerating  (a)  one 
the  defects  of  the  Median,  (b)  the  other  the  merits  of  Least  Squares.  It  is 
presumed  that  determination  by  way  of  Medians  is  less  exact  because  it  some¬ 
times  leaves  the  segment  of  a  line,  or  even  (in  the  case  of  the  double  Median) 
a  space  within  which  no  unique  value  is  distinguished.  But  the  comparative 
definiteness  of  the  Arithmetic  Mean  is  illusory,  considering  that  the  deter¬ 
mination  is  liable  to  a  probable  error.  *  *  *  The  Method  of  Least  Squares  enjoys 
an  undue  preference  in  virtu®  of  its  connection  with  the  Normal  Law  of  Error. 
For  probably  that  law  is  not  in  general  fulfilled  by  observations  so  perfectly 
as  to  justify  the  preference  given.  The  preferability  varies  with  the  charac¬ 
ter  of  the  observations .  *  *  *  When  the  curve  representing  the  observations  is 
quite  abnormal  [non-Gaussian]  it  is  very  possible  that  the  Median  should  have 
the  preference  in  respect  of  accuracy.  *”  When  the  extremities  of  the  curve 
representing  the  crude  observations  are  abnormally  protruberant,  the  Median 

is  apt  to  be  preferable.  **•  in  short,  the  use  of  the  Median  (single  or  double) 
is  often  easier,  and  sometimes  more  accurate,  than  the  Method  of  Least  Squares. 
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***  Altogether,  we  may  conclude  with  Laplace  that,  in  certain  cases,  the 
Method  of  Situtatkn  is  preferable  to  the  Method  of  least  Squares". 

Edwin  L'idwell  W’  Ison  (1923)  re-examines  the  data  of  Crum  (1923)  and 
reaches  the  following  conclusions  (pp.  850-851):  "With  the  exception  of  the 
extreme  positive  deviations  which  have  not.  been  well  fitted  by  any  of  Professor 
Crum’s  three  suggestions,  these  data  give  internal  evidence  of  following 
Laplace's  first  law  of  error  instead  of  his  second  law  and  should  be  fitted 
to  that  law.  By  simple  graphical  means  using  arith-log  [semi-log]  paper  an 
extremely  good  fit  [to  Laplace '  s  first  law]  may  be  had  in  a  very  few  minutes ' 
work  (all  that  is  necessary  is  tc  plot  the  grouped  data,  draw  a  straight  line, 
and  read  the  graph) .  Professor  Crun's  plea  for  the  use  of  the  median  in 
certain  types  of  statistics  is  much  reinforced  by  the  behavior  of  these  data 
when  discussed  in  relation  to  the  first  law  of  error." 

J.  Haag  (1924)  studies  the  precision ,  for  samples  from  a  normal  distri¬ 
bution,  of  the  arithmetic  mean,  the  median,  and  the  quasi -midrange  X=  (1/2)  (x  + 

A* 

^ »  which  is  equal  to  the  midrange  for  p=l  and  to  the  median  for  p  * 
[(n+l)/2].  He  finds  that  the  mean  is  the  most  precise,  the  median  has  asymp¬ 
totic  relative  precision  4/5,  and  the  midrange  is  the  least  precise, 

with  asymptotic  relative  precision  0. 

Jackson  (1924) ,  given  a  set  of  p  simultaneous  equations  in  n  unknown 
quantities  (p>n) ,  studies  the  question  of  determining  values  for  the  unknowns 
so  that  these  equations  shall  be  approximatley  solved,  in  the  sense  that  the 
sun  of  the  powers  of  the  absolute  values  of  the  errors  is  a  minimum.  For 
np2,  this  is  the  classical  problem  of  least  squares.  The  author  shews  that 
the  problem  has  at  least  one  solution  for  every  m>0  and  a  unique  solution  for 


m>l.  The  limiting  case  as  m-**>  is  equivalent  to  the  problem  of  minimizing 
the  maximum  error. 

Edmisid  T.  Whittaker  §  G.  Robinson  (1924)  give  the  probable  errors  of 
the  arithmetic  mean  and  the  median,  as  well  as  those  of  -various  measures  of 
precision,  based  on  the  m^  powers  of  the  absolute  errors  (npl,2,3,4,S,6) 
and  the  median  of  the  absolute  deviations  from  the  true  value,  all  under  the 
assunption  that  the  samples  are  drawn  from  a  normal  distribution.  They  dis¬ 
cuss  the  method  of  least  squares  at  some  length,  giving  an  account  of  the 
contributions  of  Legendre  (1805),  Gauss  (1809,1823,1828),  Laplace  0,812)  and 
others.  They  mention  three  alternatives  to  the  method  of  leest  squares:  (1) 
the  method  of  Tobias  Mayer  (1750) ;  (2)  the  method  of  minimal!  approximation 
[Laplace  (1799),  Goedseels  (1909),  de  la  Vallee  Poussin  (1911)];  and  (3)  the 
method  of  Edgeworth  (1887c, 1888). 

Julian  Lowell  Coolidgo  (1925)  devotes  one  chapter  of  his  book  on  prob¬ 
ability  to  errors  c-x  observation.  In  his  section  on  determination  of  the 
"best  value"  he  states  the  following  theorem,  which  he  attributes  to  Fechner 
(1874) :  "The  sun  of  the  nunerical  values  of  the  divergences  of  a  number  from 
a  given  series  of  numbers  will  be  a  minimum  if  the  nunber  in  question  be  the 
median."  He  also  includes  sections  on  the  Gaussian  law  of  error,  for  which  he 
gives  Gauss’  first  deduction [Gauss  (1809)],  with  passing  mention  of  that  of 
Hagen  (1837)  based  on  the  composition  of  elementary  errors,  and  on  doubtful 
observations. 

J.  0.  Irwin  (1925a)  finds  a  complicated  expression  for  the  exact  distri¬ 
bution  of  the  difference  the  p~  and  (p+l)^L  individuals  in  order  of  magnitude 
(from  largest  to  smallest)  of  a  sample  of  size  n  from  a  normal  distribution. 
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together  with  useful  approximations  for  differences  between  first  and  second 
and  between  second  and  third  individuals  in  order  of  magnitude,  which  he 
tabulates  for  selected  values  of  n.  Iiwin  (1925b)  proposes  a  criterion  for 
the  rejection  of  observations  based  on  these  differences.  Using  the  approxi¬ 
mation  developed  in  his  earlier  paper  [Irwin  (1925a)],  he  calculates  the  prob¬ 
abilities  that  the  differences  between  the  first  and  second  and  between  the 
second  and  third  individuals  should  be  greater  than  A  times  the  standard 
deviation  of  the  sampled  population.  If  these  probabilities  be  cane  too  small, 
he  advocates  rejecting  the  first  (the  last)  or  the  first  two  (the  last  two) 
individuals  as  not  belonging  to  the  same  homogeneous  grotp  as  the  remainder,  so 
that  a  table  of  these  probabilities  for  varying  values  of  a  provides  a  criterion 
for  the  rejection  of  outlying  observations. 

Paul  Levy  (1925)  &=vot^  a  chapter  in  his  book  on  probability  to  the 
theory  of  errors.  One  section  deals  with  the  determination  of  parameters  of 
precision.  He  proposes  using  the  interquartile  distance  to  estimate  twice 
the  probable  error  (or  0.95a,  where  a*a/2~ and  o  is  the  standard  deviation). 

He  gives  two  methods  of  determining  the  quartiles ;  later  we  shall  present 
reasons  for  preferring  a  third  method.  Another  section  deals  with  the  method 
of  least  squares;  the  author  does  not  present  any  alternative  methods. 

Joseph  Reilly,  William  Norman  Rae  §  Thomas  Sherlock  Wheeler  (1925)  advo¬ 
cate  the  use  of  the  criterion  for  rejection  of  observations  given  by  Wright 
5  Hayford  (1906) :  "Reject  each  observation  for  which  the  residual  exceeds  5 
times  the  P.  E.  (probable  error]  for  a  single  determination.  Examine  care¬ 
fully  each  observation  for  which  the  residual  exceeds  3.5  times  the  P.  E.  and 
reject  it  if  any  of  the  accompanying  conditions  are  such  as  to  produce  lack  of 
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ooB&ience”.  The  authors  explain  the  determination  of  apirical  constants 
by  the  method  of  least  squares;  they  do  not  present  any  alternative  Methods. 

L.  H.  C.  Tippett  (1925)  tabulates  the  mean,  standard  deviation,  and 
t>2  (Measures  of  skewness  and  kurtosis) ,  and  values  occurring  with  probabilities 
5%  and  1%  for  the  distribution  of  the  largest  of  n  individuals  in  saaples  from 
*  normal  population  for  selected  values  of  n  up  to  1000,  as  well  as  the  mean, 
standard  deviation,  and  and  e>2  for  the  distribution  of  the  range.  He  uses 
his  results  in  deciding  whether  or  not  to  reject  outlying  observations.  Egon 
Sharp;  Pearson  (1926)  extends  Tippett's  results. 

Estierae  (1926-27)  proposes  replacing  the  classical  theory  of  errors  of 
observation  based  on  the  arithmetic  mean  and  the  aethod  of  least  squares  by 
imat  he  calls  a  rational  theory  bac  -:d  "u  the  median  a_.d  the  method  of  least 
(absolute)  first  powers,  which  he  derives  from  the  notion  that,  for  a  con¬ 
scientious  observer,  every  Measurement  has  the-  same  subjective  probability 
1/2  of  being  too  large  as  of  being  too  small.  Interestingly  enough,  he  doe- 
not  take  the  next  logical  stop  and  propose  replacing  toe  Gaussian  law  of  error 
by  Laplace's  first  law  of  error.  Instead,  he  insists  that  his  so-called 
rational  theory  is  consistent  with  any  one  of  a  whole  family  of  error  laws , 
including  the  Gauss * to  law,  which  may  be  used  to  approximate  the  inknown  true 
law.  He  also  proposes  to  treat  the  case  of  systematic  errors,  wK  h  Gauss 
(1823)  specifically  excluded  fro©  his  treatment,  by  talcing  the  median  of  the 
Medians  of  repeated  measurements  uider  each  of  several  conditions.  He  proposes 
a  metl  od  of  embininj  m  linear  equations  in  n  unknowns  (2n>m>n)  based  on  his 
rational  thuxrry.  Even  when  the  restriction  m<2n  is  satisfied,  his  method  does 
no*  appear  to  be  practical.  He  insists  that  the  importance  of  an  error,  at 
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least  in  aany  cases ,  is  proportional  to  its  absolute  value  and  not  to  its 
square  as  postulated  by  Gauss  (1823);  that,  hcwever  mmerous  be  a  set  of 
measureaents  take.*  U,  the  nearest  tsiit,  they  cannot  provide  well  founded  rea¬ 
sons  to  adopt  a  value  stated  to  a  fraction  of  that  unt;  that  the  ac  uracy  of 
a  set  of  measurements  cannot  be  judged  by  their  consistency, since  there  nay 
be  systematic  errors;  and,  finally,  that  time  and  money  are  better  spent  in 
perfecting  measuring  instruments  so  as  to  obtain  more  accurate  observations 
than  in  increasing  the  number  of  observations. 

"Student”  [William  Sealy  Gosset]  (1927)  proposes  a  criterion  for  rejection 
of  dodbtful  observations,  based  on  the  sanple  range.  He  also  extends  the 
work  of  Tippett  (1925)  and  E.  S.  Pearson  (1926)  on  the  distribution  of  sample 
range. 

Arthur  L.  Bowley  (1928)  gives  a  sunmary  and  an  annotated  bibliography 
(74  items)  of  Edgeworth’s  contributions,  including  important  work  on  the  law 
of  error  and  the  choice  of  means,  with  special  emphasis  on  the  median.  In  the 
latter  connection,  the  author  writes  (pp.  101-102) :  "The  median  has  the  disadvan 
tage  that  its  standard  deviation  of  error  (1  divided  by  2*6  times  the  greatest 
ordinate  where  the  area  of  the  frequency  curve  is  unity)  is  greater  by  25%  than 
that  of  the  arithmetic  mean  in  the  normal  .  This  excess  is  not,  however, 
serious  in  the  rough  measurements  of  credibility  with  which  we  are  generally 
concerned  in  statistics,  and  in  some  non-normal  curves  the  median  is  more 
accurate  than  the  arithmetic  mean.  *' *  It  has  the  well-known  advantage  that  it 
can  be  computed  when  the  measurements  away  from  the  centre  are  known  only 
rwighly,  and  generally  in  graded  observations  interpolation  is  only  needed  in 
the  central  parts  *  *  * .  If  no  graduation  is  necessary  the  median  is  evidently 


the  easiest  mean  to  compute,  and  if  the  maxiimm  ordinate  is  known  its  prob¬ 
able  error  can  at  once  be  written  down.  As  is  well  known,  the  median  is 
that  position  which  makes  the  sub  of  the  deviations  from  it  (all  taken  as 
positive)  a  minima,  the  test  of  least  detriment  suggested  by  Laplace.  *** 
Finally,  if  a  mean  is  required  of  discordant  observations,  where  discordance 
signifies  that  the  observations  are  taker  from  facility  curves  with  different 
moduli,  there  is  ’a  peculiar  propiety  in  the  use  of  the  Median.'  ***.  For 
these  reasons  Edgeworth  attached  great  importance  to  the  median  in  a  consider¬ 
able  lumber  of  problems.  His  advocacy  extends  to  its  use  for  two  or  more 
unknowns  ,M,"  The  last  statement  is  exemplified  by  seven  pages  (pp.  103-109) 
on  the  method  of  situation  of  Laplace  (1812)  [1818]  and  its  modification  (by 
eliminating  the  restriction  that  the  suns  of  positive  and  negative  deviations 
be  equal  in  magnitude)  and  extension  (to  two  or  more  unknowns)  by  Edgeworth 
(1887b, c, 1888, 1923) . 

Jerzy  Neyman  and  E.  S.  Pearson  (1928)  give  an  expression  for  the  prob¬ 
ability  integral  of  the  range  of  samples  of  size  n  and  tabulate  the  mean, 

2 

standard  deviation,  8^  ■  a7  and  ^  “4  f°r  the  standardized  range  W/o  for 
samples  of  size  n  -  364,6, 10,20  from  rectangular  and  normal  distributions. 

Edwin  B.  Wilson  8  Margaret  M.  Hilferty  (1929)  re-examine  an  extensive 
series  of  observations  for  which  C.  S.  Peircx  ['Theory  of  Errors  of  Observa¬ 
tions,"  Report  of  the  Superintendent  of  the  U.  S ,  Coast  Survey  (for  the  year 
ending  November  1,  1870),  Appendix  No.  21,  pp.  200-204  and  Plate  No.  27.  U.  S. 
Government  Printing  Office,  Washington,  1873]  concluded  that  the  normal  law 
was  verified,  and  reach  the  opposite  conclusion.  Peirce's  data  consist  of 
about  500  observations  each  day  for  24  differenc  days.  Ths  authors  give  the 


standard  deviations  of  the  median  and  of  the  mean  for  each  day,  and  proceed 
to  compare  than,  reaching  the  following  conclusions  (p.  124) :  "The  ordinary 
statement  based  on  the  normal  law  is  that  the  deterimnaticn  of  the  median  is 
251  worse  titan  that  of  the  mean.  A  comparison  of  the  standard  deviations  of 
the  median  and  mean  ** *  shows  that  fcr  these  observations  the  median  is  better 
determined  than  the  mean  on  13  days,  worse  determined  on  9  days,  and  equally 
well  determined  on  2  days.  Roughly  speaking,  this  means  that  mean  and  median 
are  on  the  whole  equally  well  determined,"  The  results  tend  to  show  not  only 
that  the  data  have  net  come  from  a  normal  distribution  but  that  for  some  dis¬ 
tributions  the  median  is  more  precise  than  the  mean. 

E.  C.  Rhodes  (1930)  gives  (pp.  974-978)  the  following  accouit  of  problems 
encountered  in  an  application  in  a  practical  situation  of  the  method  of  mini- 
ntian  deviations:  "The  writer  recently  was  desirous  of  smoothing  out  the 
fluctuations  in  [a]  series  of  figures  r17  pairs  of  values  of  x=8(l)+8  and  y]***. 
A  parabola  was  fitted  by  the  method  of  Least  Squares,  '* *  The  result  was  not 
considered  altogether  satisfactory.  ***  It  was  considered  that  the  parabola 
was  a  bad  fit.  TWo  reasons  suggested  themselves  for  this,  first,  the  original 
data  from  which  the  series  was  obtained  did  not  involve  absolutely  random 
fluctuations ;  second,  the  parabola  might  not  be  the  best  curve  for  use  in 
smoothing.  ***  It  was  ***  decided  to  concentrate  on  the  first  consideration, 
which  meant  that  although  we  had  obtained  the  parabola  of  best  fit  by  the  method 
of  least  squares ,  yet  it  might  not  really  be  the  best  parabola  which  would 
smooth  out  the  fluctuations  in  the  series.  This  led  us  to  the  question  of  what 
other  methods  of  fitting  there  were  available,  and  Edgeworth's  description  of 
the  use  of  medians  in  this  connexion  led  to  the  attempt  to  fit  by  the  method  of 
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Minimun  Deviations.  This  method  may  be  briefly  described.  Sippose  the 
equation  to  the  parabola  is  y *ag+  a^  x+a^x  ,  and  the  given  y’s  are  y_g,  /.y, 
*“,  y_p  yg,  yi#  "*,  yg,  then  instead  of,  as  in  the  method  of  Least  Squares, 
making  S^f_8(a0+  x2-yx)2  a  minimun,  we  make  ^  x+a2  x2-yxi 

a  minimun.  His  [Edgeworth’s]  description  of  the  method  and  his  arguments 
in  its  favour  are  briefly  sunnarized  [by]  Eowley  [(1928)],  pp.  103  et  seq. , 
and  are  exposed  by  Edgeworth  [(1888,1923)].  Unfortunately ,  Edgeworth  confined 
himself  in  the  working  of  the  method  to  a  reliance  on  a  diagram  (he  used  as 
illustrations  the  problem  of  two  variables) ,  which  means  in  practice  a  rather 
laborious  piece  of  work,  and  apparently  did  not  notice  that  the  method  could 
be  applied  in  a  more  simple  manner.  ***  The  simpler  method  is  as  follows 
Sippose  we  are  dealing  with  a  series  of  deviations,  say,  involving  three 
unknowns,  A^ u+BjV+CjW+Dj^ > A2U+B2V*C2W+D2 ,  \U+Bnv+Cpw+I)n »  and  we  want  to 

find  values  of  u,v,w  which  make  S?  ,  jA  u+B  v+C  w+D  |  a  minimun.  First,  find 

3  X  I)  D  O 

for  what  values  of  v  and  w  the  expression  is  a  minimun  when  u  is  given  by 
-  (ByV+C^w+D^.) /Ay ,  i.e.  find  a  local  minimun  point  in  the  plane  Aru+B^v+Crw+Dr* 
0,  where  r  is  any  one  of  the  values  of  s  from  1  to  n.  This  reduces  the  problem 
to  one  involving  two  variables  only,  i.e.  what  values  of  v  and  w  will  make 
S&l  V^t*6*  |  a  udnimun,  where  the  E’s,  F's,  G’s  are  obtained  from  the 
A's,  B’s,  C’s,  D's.  To  solve  this,  find  for  what  value  of  w  the  expression 
is  a  minimun  when  v  is  given  by  "(FpW  +  Gp)/Ep,  i.e.  find  a  local  ndnimun  point 
the  line  LpV+FpW+Gp  *  0,  where  p  is  any  one  of  the  values  of  t  from  1  to 
n-1.  This  reduces  siaply  to  the  problem  of  finding  a  weighted  median. 

Thyp  [the]  point  of  intersection  of  *’*  three  planes  [A^u^v+C^w+D^-Q^uf-B^v 
*  C^w+D^  *  0,  Anu+Bnv+C^w+Dn  3  0]  is  the  true  minimum  point,  and  the  values  u, 
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v,  w  obtained  from  solving  these  equations  make  I Asu+Bsv*Csw+Ds  I  a 

nun." 

Tokishige  Ho jo  (1931)  studies  the  distribution  of  the  median,  quartiles , 
and  interquartile  distance  in  sanples  from  a  normal  population.  He  compares 
the  precision  of  the  median  with  that  of  the  mean  and  the  midrange,  and  con¬ 
siders  the  problem  of  estimating  the  population  standard  deviation  from  the 
interquartile  distance  of  a  sample.  In  defining  the  sample  median,  M,  and 
the  sample  quartiles,  Qx  and  Q3,  of  a  sample  of  size  n,  the  author  distin¬ 
guishes  four  cases  according  as  (i)n«4m,  (ii)  n**4m+l,  (iii)  n*4m+2;  (iv)  n  * 

4m+3,  where  m  is  an  integer.  His  definitions  for  these  cases  are  as  follows: 

(i)  Qi“0Sn+  Vkj)/2,  x2m+l^2,  ^3“  ^x3m+  x3m+l^2 ’ ^fo+p/2 

°W  XW/2:  <iU)V  Vl-  “-(^l  ♦  W/2.QJ*  x3n+2; 

i.L 

(iv)Qj*  xj^,  M"X2m4.2»  ^3MX5n+3’  w*'-ere  \  <fen°t®s  the  i~  smallest  observa¬ 
tion.  The  present  writer  prefers  to  define  the  median  and  the  quartiles  so 
that  they  divide  the  population  into  four  intervals  with  equal  probability 
i/4*  viz.  Qj*  x(n+1)/4,  ^x(n+1)/2,  Q3«  x3(n+1)/4,  where  a  fractional  subscript 
indicates  interpolation  between  adjacent  ordered  observations.  This  definition 
agrees  with  Ho  jo's  except  for  the  quartiles  in  cases  (i)  and  (iii)  above,  for 
which  it  yields,  (i)  Qj-  (3x^+  x^^/4,  Q3=  (x^+^x^p^;  (iii)  Qj- 

3jWl^/4»  ^3“  ^30+2  +  xSn+3^/4* 

Richaid  von  Mises  (1931) ,  in  his  volune  on  probability  and  its  applica¬ 
tions,  includes  a  section  cm  elementary  descriptive  statistics  in  which  he 
discusses  the  median,  quartiles,  deciles  and  percentiles  in  addition  to  the 
arithmetic  mean  and  the  standard  deviation.  There  is  also  a  chapter  on  the  theory 
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of  errors  and  adjustment  of  observations  in  which  the  author  follows  the  ap¬ 
proach  of  Gauss  (1809,1823). 

Karl  Pearson  (1931)  gives  tables  of  criteria  (Chauvenet's  and  Irwin's) 
for  the  rejection  of  outlying  observations  and  of  the  distribution  of  range, 
median,  and  midrange  in  samples  from  a  normal  population.  Walter  A.  Shewhart 
(1931)  discusses  the  use  of  the  median  and  the  midrange  as  measures  of  central 
tendency  instead  of  the  arithmetic  mean,  and  the  use  of  the  range  as  a  measure 
of  dispersion  instead  of  the  standard  deviation. 

Allen  T.  Craig  (1932a, b)  proves  several  useful  theorems  concerning  the 
distributions  of  the  sample  median,  mean,  midrange,  first  quartile,  and 
range.  Harold  Jeffreys  (1932)  offers  an  alternative  to  the  rejection  of 
observations.  He  takes  the  probability  of  an  error  to  be  given  jointly  by 
two  normal  [Gaussian]  laws,  one  for  the  normal  and  the  other  for  the  abnormal 
errors ,  and  provides  a  method  of  solution  for  the  five  unknowns  (means  and 
standard  deviations  for  normal  and  abnormal  errors  and  proportion  of  normal 
errors),  together  with  an  approximate  solution  by  a  method  of  weighting,  the 
weight  of  an  observation  being  a  continuous  function  of  its  deviation. 

Willem  J.  buy  ten  (1932)  considers  data  on  the  differences  of  pairs  of 
measures  of  the  distance  of  double  stars,  and  concludes  that  Laplace's  first 
error  curve  fits  the  data  much  better  than  his  second  (the  normal  curve) .  He 
points  out  (p.  365)  that,  as  a  corollary  of  the  use  of  the  first  Laplace an 
curve,  it  is  no  longer  the  arithmetic  mean  and  the  standard  deviation  but  the 
median  and  the  arithmetic  mean  error  [mean  deviation  frnrc  the  median]  that  are 
the  significant  constants  of  the  distribution. 

Egon  S.  Pearson  (1932)  gives  a  tcble  summarizing  available  results  on  the 
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distribution  of  range  in  sanples  of  100  or  less  from  a  normal  population. 

He  discusses  the  method  of  amputation  of  the  table  and  experimental  checks 
on  the  adequacy  of  the  approximation  employed,  and  gives  illustrations  of  the 
use  of  the  table. 

P.  R.  Crowe  (1933)  proposes  a  graphical  method,  based  on  the  median  and 
the  quartiles,  of  representing  the  distribution  of  monthly  rainfalls.  He 
compares  the  median  with  the  mean  and  the  mode ,  and  the  quartile  deviation 
with  the  standard  deviation  and  the  mean  deviation,  giving  advantages  and 
disadvantages  of  each  from  the  viewpoint  of  the  climatologist. 

A.  T.  McKay  and  E.  S.  Pearson  (1933)  develop  theory  which  leads  to  certain 
new  results  regarding  the  form  of  the  range  curve  at  its  terminals  aid  pro¬ 
vides  the  exact  distribution  of  the  range  of  sanples  of  3  from  a  normal  popu¬ 
lation.  They  also  give  the  exact  distributions  of  the  range  in  sanples  of  n 
from  rectangular  and  right  triangular  universes ,  the  former  having  previously 
been  given  by  Neyman  and  Pearson  (1928) . 

Paul  Reece  Rider  (1933)  summarizes  the  history  of  criteria  for  rejection 
of  observations  from  Peirce  (1852)  to  Jeffreys  (1932) ,  and  draws  the  following 
conclusion  (pp.  21-22) :  “Fran  the  various  methods  cited  above  it  is  easy  to 
see  that  devices  for  rejecting  discordant  observations  could  be  ini en ted  with¬ 
out  nunber.  The  choice  of  which  to  use,  if  any,  is  largely  an  individual 
matter.  If  one  is  willing  to  subscribe  to  the  hypothesis  laid  down  in  a 
given  criterion,  he  should  be  willing  to  abide  by  the  result  of  applying  the 
criterion  to  a  set  of  data.  *“  In  the  final  analysis  it  would  seem  that  the 
question  of  the  rejection  or  retention  of  a  discordant  observation  reduces 
to  a  question  of  cannon  sense.  Certainly  the  judgment  of  an  experienced 
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observer  should  be  allowed  considerable  influence  in  reaching  a  decision.  This 
judgment  can  undoubtedly  be  aided  by  the  application  of  one  or  lore  tests  based 
on  the  theory  of  probability,  but  any  test  which  requires  an  inordinate  aoount 
of  calculation  seeas  hardly  to  be  worthwhile,  and  the  testinony  of  any  criter¬ 
ion  which  is  based  upon  a  cooplicated  hypothesis  should  be  accepted  with  extreme 


caution". 

H  th 

Hans  Muizner  (1934)  studies  the  precision  of  the  absolute  moment  (a>0) 
for  the  generalized  Gaussian  distribution  function  ^(e)*  [h i /2r(l/x)]e“kXl€l 
X^l.  He  shows  that  the  maximum  precision,  is  attained  when  w*x.  He  points  out 
that  in  the  special  case  <*»x*2,  this  reduces  to  the  statement  that  the  standard 
deviation  is  the  most  precise  measure  of  dispersion  for  the  Gaussian  distribu¬ 
tion,  as  shown  by  Gauss  himself.  Another  special  case,  not  emphasized  by  the 
author,  is  «-x-l,  in  which  the  result  reduces  to  the  statement  that  the  mean 


deviation  is  the  most  precise  measure  of  dispersion  for  Laplace's  first  distri¬ 
bution.  We  have  already  seen  that  the  mean  deviation  is  a  m  .mnnm  when  taken 
dbmit  the  median  rather  than  about  the  arithmetic  mean. 

Harry  S.  Pollard  (1934)  gives  the  following  results  concerning  medians: 

(1)  an  exact  expression  *  VC2/S+3)  for  the  standard  deviation  of  the  media;, 
of  samples  of  (2n+l)  items  (n  on  integer)  from  a  rectangular  population  with 
probability  density  function  f(x)-l  over  a  unit  interval,  which  compares  with 
the  classical  (Urge -sample)  approximation  c^«l/[2f(Q)/3]  where  s  is  the  sample 
sire  and  f(0)  is  the  value  nf  the  p.d.f.  at  the  population  median;  (2)  upper 
and  lower  limits  for  the  standard  deviation  of  the  median  for  staples  from  any 
population,  and  (3)  a  method  of  determining  the  probable  error  (or  any  percentile) 
of  the  distribution  of  the  median  for  samples  of  size  (2n»l) ,  for  tdiich  Dodd 
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(1922)  has  given  tike  p.d.f. 

Maurice  Predict  (1935)  compares  the  precision  of  the  Mean  and  the  Median. 
He  adaits  that  the  aean  is  to  be  preferred  to  the  Median  in  the  Majority  of 
cases,  but  insists  that  it  is  not  always  so — Median  life  or  Median  incone 
provides  a  more  representative  value  than  the  corresponding  neaa.  Certain 
statisticians  object  to  the  use  of  die  Median  on  the  around  that  its  precision 
is  less  than  that  of  the  Mean,  Which  is  true  for  the  nomal  law  of  error 
(Laplace’s  second  law),  but  not,  as  the  author  shows,  for  certain  other  prob¬ 
ability  laws.  He  studies  two  laws  for  which  the  median  is  more  precise  than 
die  Mean,  at  least  for  sables  of  size  three.  Let  y'  be  the  standard  error 
of  the  Mean  and  y"  be  that  of  the  Median.  For  samples  of  size  three  from  la- 

place's  first  law  of  error  with  cumulative  distribution  function  F(x)»  ex/2  for 
x*0,  F(r)»l-e~x/2  for  »0,  die  median  is  slightly  more  precise  [y*2-  u"2*  1/36], 

while  for  sasgiles  of  size  three  from  the  probability  law  F(x)-  0  if  x*l,  F(x)« 

l-x~°if  x*l,  with  l<a42,y"  is  finite  but  y*  is  infinite.  The  author  presents 

supporting  evidence  from  a  paper  by  Wilson  §  Hilferty  (1929). 

R.  C.  Geary  (1935)  considers  the  ratio  vn  of  the  mean  deviation  to  the 
standard  deviation  for  infinite  random  samples  from  a  normal  population] 

as  a  test  of  normality.  He  tikes  the  mean  deviation  from  the  mean  rather  than 
from  die  median.  He  notes  that  /fF^  is  a  test  of  sy&a&try  rather  than  of 
normality,  while  the  frequency  distribution  of  ^  (a  measure  of  lourtosis)  is 
laiknown.  E.  S.  Pearson  (1335)  compares  g.,  and  Geary's  criterion.  He  concludes 
that  there  are  strong  practical  grounds  for  choosing  the  latter  as  a  test  of 
whether  the  population  saapled  is  platykurtic  or  leptokurtic  unless  the  sample 
is  very  large,  then  &2  mY  k®  used,  but  that  is  the  best  criterion  of 
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slowness,  said  that  two  tests  (/^  and  either  $2  07  Geary’s  criterion)  are 
required  for  departure  fro*  normality. 

A.  T.  McKay  (1935)  studies  the  distribution,  for  saaples  from  a  normal 
distribution,  of  u-X-X,  inhere  X  is  the  hipest  observation  and  x  is  the  nean. 
He  uses  the  statistic  u  as  the  basis  of  a  criterion  for  rejection  of  outliers, 
and  compares  this  criterion  with  the  criterion  of  Irwin  (1925)  and  one  based 
on  the  distribution  of  range  tabulated  fay  E.  S.  Pearson  (1932) . 

William  R.  Thompson  (1935)  derives  the  distribution  of  t*6/s  ,  where  s  is 
the  sample  standard  deviation  and  6  is  the  deviation  of  an  arbitrary  observa¬ 
tion  from  the  sample  mean,  and  uses  this  statistic,  for  which  he  tabulates 
critical  values  Tq,  as  a  criterion  for  the  rejection  of  observations  deviating 
from  the  mean  by  more  than  St3. 

Georges  Dannois  (1936)  discusses  the  relative  precision  of  the  median  and 
the  arithmetic  mean  of  samples  from  various  populations,  and  the  related  ques¬ 
tion  of  whether  to  use  the  method  of  least  squares  or  the  method  of  least 
absolute  first  powers. 

E.  S.  Fears uii  and  C.  Chandra  Sekar  (1936)  study  the  criterion  for  rejec¬ 
tion  of  outlying  observations  proposed  by  Thompson  (1935) .  They  point  out  that 
it  provides  complete  control  over  the  probability  of  type  I  error  (rejecting 
the  hypothesis  that  all  the  observations  have  been  drawn  from  a  single  normal 
population,  with  uispecified  mean  and  standard  deviation,  when  that  hypothesis 
is  true).  Nevertheless,  they  show  that  the  criterion  is  quite  inefficient 
in  the  presence  of  two  or  more  outliers  unless  the  sample  is  quite  large. 

Emil  J.  Gurbel  (1937)  studies  the  precisian  of  the  arithmetic  mean  and 
of  the  median,  and  verifies  that  the  former  is  more  precise  for  imifbrm, 


Gaussian,  and  extreme- value  distributions,  and  the  latter  for  the  symmetric 
double  exponential  (Laplace's  first)  distribution. 

Curtis  Bruen  (1938)  considers  various  methods  of  combining  observations 
based  on  the  concept  of  power-means,  as  defined  by  Fechner  (1874).  Tbe  p— 
order  power  mean  of  a  set  of  observations,  x^(i*l,2,3,*** ,n)  is  that  value, 
x,  which  makes  the  sum,  £|x.-x|P,  a  Tinimua.  It  is  well  known  that  the 
median  is  the  first-order  power-mean,  the  arithmetic  mean  is  the  second-order 
power-mean,  and  the  midrange  is  the  limiting  value  of  the  p^  order  power-mean 
as  p-*».  Not  so  well  known  is  the  fact , which  the  author  attributes  to  R.  M. 
Foster  (1922) ,  that  the  mode  is  the  limiting  value  of  the  p^-  order  power- 
mean  as  p-H).  The  author  generalizes  the  concept  of  the  pcwer-mean  from  the 
case  of  direct  observations  to  that  of  indirect  observations  or  of  implicit 
functional  observations ,  for  which  it  leads  to  the  method  of  least  power- 
suns  of  the  absolute  values  of  the  deviations.  Corresponding  to  mode,  median, 
mean,  and  midrange  one  has  then  the  methods  of  least  nunber  (least  sun  of 
zero  powers) ,  least  sun  of  first  powers ,  least  sun  of  squares ,  and  least  maxi¬ 
mum  (least  sun  of  infinite  powers)  of  the  absolute  deviation*.  Jackson  (1924) 

has  studied  the  existence  and  the  uniqueness  of  solutions  by  the  method  of 
th. 

least  p —  powers  (0<p$«)  of  sets  of  n  simultan.  ->us  linear  equations  in  m 
unknowns,  when  m<n.  Bruen  reviews  the  contributions  to  the  theory  of  errors 
of  Mayer,  Boscovich,  Laplace,  Legendre,  Gauss,  Cauchy,  Glaisher,  Fechner, 
Edgeworth,  Turner,  Goedseels,  de  la  Vallee  Poussin,  Rhodes  and  others.  He 
closes  with  a  discussion  as  to  the  choice  of  method,  in  which  he  points  out 
that  the  choice  depends  on  the  presumed  distribution  of  deviations,  each 
method  being  best  for  a  particular  distribution- -the  mode  in  one  variable  or 
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the  nodal  point  in  two  or  more  variables  for  a  spite  distribution  (single  isolated 
value) ,  the  median  or  median  loci  for  a  symmetric  etgctcantial  (first  Laplacean) 
distribution,  the  mean  or  mean  loci  for  a  normal  (Gaussian  or  second  Laplacean) 
distribution,  and  the  midrange  or  midpoint  of  least  range  for  a  uniform 
(rectangular)  distribution. 

E.  L.  Dodd  (1938)  reproduces  some  of  the  results  of  Jackson  (1923)  con¬ 
cerning  the  median,  quartiles,  and  other  positional  means  (quantiles)  and 
emmerates  some  of  the  properties  of  these  measures. 

Jose  Barral  Souto  (1938)  shows  that  the  mathematical  expression  M^* 

(S-1  Pi  §1/h-  «i  non-mgatiw  real  and  the  Pj  are 

positive  weights  whose  sun  is  1,  leads  for  particular  values  of  h  to  the 
following  "means":  gives  the  smallest  of  the  a. ,  h— 1  gives  the  (weighted) 

harmonic  mean,  h-0  gives  the  (weighted)  geometric  mean,  h-1  gives  the  (weighted) 
quadratic  mean,  and  h—  gives  the  largest  of  the  a^.  If  the  ai  are  replaced 
by  their  absolute  deviations  from  a  certain  value  of  x,  ^  is  transformed  into 

Souto  shows  that  the  values  of  x  which  minimize 
\(x)  are  the  following  for  particular  values  of  h:  for  h-K),  the  mode;  for 
h»l,  the  (weighted)  median;  for  h»2,  the  (weighted)  arithmetic  mean;  and  for 
h"“,  the  midrange.  The  corresponding  minimun  va^jes  of  M^(x)  are  respectively 
the  geometric  mean  deviation  (~ero) ,  the  mean  (absolute)  deviation,  the  root- 
mean-square  deviation  (standard  deviation),  and  the  semirange.  These  results 
are  a  generalization  and  extension  of  thcoe  given  by  Bruern  (1938)  and  earlier 
authors. 

Gunbel  (1939)  studies  the  determination  of  the  median,  quartiles,  and 
other  quantiles  from  small  samples.  He  takes  as  the  three  quartiles  the 
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[(n+l)/4]— ;  [(r.+l)/2]— *,  and  [3(n+3)/4]—  values  from  below,  where  n  is  the 

sample  size.  In  an  analogous  manner,  he  takes  as  the  quantile  corresponding 

to  cumulative  probability  X  the  [  (n+1)  X] —  value  from  below. 

Jeffreys  (1939)  discusses  the  use  of  the  median  instead  of  the  mean  and 

the  rejection  of  observations.  He  points  out  that  the  mean  is  the  best  average 

only  when  the  underlying  distribution  is  normal,  in  which  case  the  standard 

deviation  is  the  best  measure  of  dispersion;  in  the  same  w ay,  the  median  is 

associated  with  Laplace's  first  distribution  and  the  mean  (absolute)  deviation. 

He  offers  the  opinion  that  there  is  much  to  be  said  for  the  use  of  the  median 

when  the  form  of  the  Is*  of  distribution  is  unknown  because  it  is  less  affected 

by  a  few  abnormally  large  residuals  than  is  the  arithmetic  mean.  He  criticizes 

Peiroe's  and  Chauvenet’s  criteria  for  the  rejection  cf  observations,  and  offers 

a  modified  form  of  the  alternative  which  he  proposed  earlier  [Jeffreys  (1932)], 

with  a  table  of  weights  for  its  implementation. 

Niels  Arley  (1940)  generalizes  the  results  of  W.  R.  Thompson  (1935) .  Let 

X1  ,x2 *  *  *  *  *  *n  be  independent  normal  variates  with  mean  £(xi)»  p. , 

1  J 

where  the  a's  are  known  coefficients  and  the  p*s  unknown  parr-meters .  Let  the 
*  2  2  2 

variance  of  Xj  be  oi  -  o  /Pi,  where  o  is  unknown  but  the  weights  P.  are  known. 

2  1 

Finally  let  ^  denote  the  estimate  of  e(x^)  and  the  estimate  of  the  variance 
of  x.-  both  obtained  by  the  method  of  least  squares.  Arley  shows  that  the 
probability  density  function  of  r^=  (x^-  Kj)/S^  is  given  by  p(r)=  const,  (n-m- 
(n-m-3)/2  for  jrj^n-m.  He  applies  this  result  to  rbtain  a  criterion  for  the 
rejection  of  observations  .which  he  compares  with  criteria  proposed  by  various 
other  authors. 

Frechet  (1940a)  compares  certain  measures  of  dispersion  of  the  sample 
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■edian  and  the  saaple  mean  for  large  saqples  from  uiiaodal  distributions  with 
finite  variance.  Let  X  be  a  randan  variable  with  inimodal  distribution,  o% 
its  standard  error,  ex  its  u^an  absolute  error,  and  Ex  its  quartile  deviation, 
and  let  M  be  the  endian  and  V  the  arithmetic  mean  of  a  sample  from  this  distri¬ 
bution.  Frechet  shows  that  for  sufficiently  large  n,  (i)oMsl.74ox/i'ST,(ii)G^ 
*1.600^^1, (iil)J^|<l. 35  E^i^n.  For  V,  there  is  the  equality  <y*  °x/'^  corres¬ 
ponding  to  (i) ,  but  there  are  no  inequalities  for  V  corresponding  to  (ii)  and 
(iii) .  Frechet  concludes  that  the  sample  median  should  be  more  widely  used 
except  when  the  distribution  is  known  to  be  such  that  the  sample  mean  is  better. 
Frechet  (1940b)  obtains  the  following  inequalities  for  the  measures  of  disper¬ 
sion  of  the  randan  variable  X  itself,  the  notation  being  the  same  as  in  the 
preceding  paper:  0<ex/oxiljO^Ex/e^2,  If  ^  is  the  upper  bound  of 

the  mean  probability  density  of  X  between  x^  and  7^:  Pro b{xi<X<x2}/(x2-  Xj) 
when  Xp  vary,  then  he  shows  that  the  following  inequalities  hold:  O$1/A0X 
$4;  0$lfttox$2/5;  Q$l/f£^4,  Moreover,  he  shows  that  none  of  the  twelve  in¬ 
equalities  among  <rx,  0x,  Ex  and  Ax  can  be  replaced  by  a  sharper  one. 

Edwar*.  Paulson  (1940)  finds  the  distribution  of  the  median  of  a  random 
sample  of  size  (2n+l),  where  n  is  an  integer,  from  a  symmetric  population  (whose 
mean  and  median  are  both  zero)  in  terms  of  the  incomplete  Beta  function,  which 
has  been  tabulated  by  Karl  Pearson.  He  suggests  that  his  results  are  especially 
useful  in  sampling  from  populations  such  as  the  Cauchy  distribution,  for  which 
the  mean  is  not  a  consistent  statistic, 

Robert  R.  Singleton  (1940)  points  out  that  the  method  given  by  Rhodis 
(1930)  for  estimation  of  the  parameters  in  a  regression  equation  by  minimizing 
the  sum  of  absolute  values  of  deviations  is  iterative  and  recursive,  and  is 


presented  without  proof.  He  tees  geometric  Methods  and  terminology  to  develop 
proofs  for  various  Methods  and  to  obtain  a  new  method  which  reduces  the  labor 
by  eliminating  the  recursive  feature. 

Abraham  Wald  (1940)  consider;  two  sets  of  random  variable".  *  * , 

N;N  even).  Neither  the  true  values  ,Y^  nor  the  coefficient^  a  and  8  of  the 
linear  relation  [Y^  ■  aXj+B]  between  them  is  known.  As  an  estimate  of  a  Fcf. 
Lambert  (1765a)]  he  uses  a-^/a^,  where  a^*[(x^+‘ ‘’+xta)-(xto+^+’* *+x^)]/N,a2* 
[(y1+***-^jn)-(ymfl+***-»y^)3/N  and  m-N/2.  He  points  out  that  the  greater  |aj 
tiie  more  efficient  is  the  estimate  a  of  a,  and  that  |a^|  is  a  maximum  when  one 
orders  the  observations  so  that 

E.  B.  Wilson  (1940)  points  out  that  the  usual  formula  for  the  standard 
error  of  the  median  of  random  samples  of  „  ze  n,oM«l/2^5T,  where  ^  is  the 
value  of  the  probability  density  function  at  the  mediso ,  mlike  the  corres¬ 
ponding  formula  a//ti  for  the  mean,  is  not  universally  valid.  He  explores 
various  pathological  cases  for  which  it  gives  incorrect  results.  He  then  sets 
out  to  find  a  true  expression  for  oM,  restricting  himself  to  sanples  of  odd 
size  n*?k+l(k  an  integer)  from  a  peculation  that  is  symmetric  about  the  origin, 
with  probability  density  function  4(x)  and  emulative  distribution  function 
♦(x)  (not  the  author's  notation].  The  probability  density  function,  of  the 
median. is  than  *(x)«[(2k+l)l/(k!)2][*(x)]k[i-*(x)j^4(x)f  its  mean  is  zero,  and 
its  variance  is  give*  by  * Jx  tfi(x)dx,  where  the  integration  extends  over 

the  whole  range  of  the  function  $(x).  The  author  applies  this  result  to  show 
that  the  standard  error  of  the  median  of  samples  of  size  n  from  the  Cauchy 
distribution  $(x)»  l/*(l+x2)  is  infinite  for  n«3,  but  finite  for  n*5,7,***. 

R,  J.  Brookner  (1941)  shows  that,  for  a  random  sample  of  size  N  from  a 
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rectaagylar  population  cf  length  1  around  an  aikncvn  value  0  [that  is ,  f  (x)* 
1  if  8-l/2^e+l/2,f(x)-0  otheivise] ,  the  variance  of  the  sample  Eidrange 
t  is  1/[2(N+1)  (N+2)].  Ibis  coopares  with  a  variance  of  1/12N  for  the  sample 
mean  5c.  Both  the  mean  and  the  midrange  are  unbiased  estimators  of  e.  The 
ratio  of  the  variance  of  t  to  that  of  35  is  6N/[(N+1)  (N+2)] ,  which  is  less 
than  one  for  N>2  and  approaches  zero  as  N-+«,  so  the  mean  is  a  poor  estimator 
of  central  tendency  for  a  rectangular  population. 

Maurice  Frechet  (1941)  exc^nnes  two  methods  of  demonstrating  the  valid¬ 
ity  of  the  normal  law  of  error— the  method  of  Gauss  (1809)  based  on  the 
postulate  that  the  arithmetic  mean  is  the  best  average  of  a  set  of  equally 
reliable  observations  and  the  method  [due  to  Hagen  (1837) ,  whom  the  author 
does  not  mention)  based  on  the  composition  of  a  large  number  of  small  ele¬ 
mentary  errors.  Be  does  not  find  either  method  convincing,  and  concludes  that 
yerfication  is  possible  only  experimentally,  by  comparison  with  actual  data. 


This  can  be  accomplished  in  various  ways --by  comparing  theoretical  and  obser¬ 
ved  frequencies  in  the  various  classes  of  a  grouped  frequency  distribution; 
by  computing  the  Pears oni an  measure  of  kurtosis  B2"£,4a,’Vy2»  y  is  the 
about  the  mean,  for  the  data  and  onparing  it  with  the  theoretical 
value  [3  for  the  normal  (second  Laplace  an)  law,  6  for  the  first  Laplace  an 
law];  or  by  computing  the  ratios  J>(q2-q1)/(d2-d1)  and  0(q2-q1)/(c2-c1) , 
where  q^  and  q2  are  the  first  and  last  quartiles ,  and  are  the  first 
and  last  terries,  and  Cj  and  c2  are  the  first  and  last  cen tiles ,  and  com¬ 
paring  them  with  the  theoretical  values  [00.5263  and  00.2899  for  Laplace’s 
second  law,  D-0.4307  and  00.1772  for  Laplace’s  first  law].  The  author 
applies  the  latter  method  to  data  on  artillery  fire.  Returning  to  theory. 
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be  shews  the  correspondence  between  lass  of  error  and  seas  ores  of  central 
tendency  mud  dispersion,  the  arithmetic  mean  and  the  standard  deviation 
corresponding  to  Laplace's  second  law  and  the  median  and  the  mean  absolute 
deviation  (from  the  median)  to  Laplace’s  first  law,  etc.  He  extends  this  cor¬ 
respondence  to  the  mil ti variate  case  (two  or  more  dimensions) . 

H.  0.  Hartley  (1932)  writes  the  probability  integral  of  the  range  W 
in  random  saaples  of  size  n  from  a  population  with  probability  density  func¬ 
tion  f(x)  in  the  form  Pa00*  n  (£)  if  ^  f(x)dx)n  1  d£.  Hartley  and  E« 
Pearson  (1942),  using  numerical  integration,  tabulate  Pn(W)  to  4  decimal  places 
for  samples  of  size  n«2(l)20  fron  a  standard  normal  population  at  intervals 
of  0.05  in  W. 

K.  Raghavan  Nair  and  M.  P.  Shrivastava  (1942)  propose  a  simple  method 
of  curve  fitting  by  grouping  the  residuals  into  as  many  groups  as  there  are 
unknown  coefficients  to  be  estimated,  and  using  the  group  averages  to  estimate 
the  coefficients. 

George  A.  Barnard  (1943)  studies  the  use  of  the  median  in  place  of  the 
mean  in  quality  control  charts, 

Samuel  S.  Wilks  (1943)  gives  an  excellent  textbook  treatment  of  order 
statistics  and  functions  of  order  statistics,  including  the  largest  or  smallest 
sanple  value,  the  sanple  median,  and  the  sanple  range. 

R.  C.  Geary  (1944)  compares  the  mean,  the  midrange,  and  the  median  as 
measures  of  central  tendency  for  a  rectangular  population  with  knewn  range  1. 

E.  J.  Gumbel  (1944)  studies  the  moment  characteristics  of  the  distribution, 
in  a  sanple  oi  size  n,  of  the  m^  largest  and  smallest  values  and  of  their 
sue  [*2J  and  difference,  the  m~ midrange  and  m—  range.  He  assumes  that  n  is 
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so  large  that  the  tMO  values  way  be  regarded  as  independent.  For  «f1, 
the  a^- midrange  «od  arrange  [or  quasi-sridracgB  and  quasi-range.,  es  they  are 
■ore  ccMUoly  called]  become  simply  the  add* «ge  and  the  range. 

Herbert  Robbins  (1944)  determines  the  expected  values  of  the  difference 
between  the  largest  and  s» sliest  order  statistics  [the  range  RJ  and  of  the 
difference  F  between  the  values  of  the  emulative  distrib'aticsx  function 
evaluated  for  the  largest  and  smallest  order  statisti.es.  The  results  are 
E(F)*(n-l)/(n*l)  and  E(R)"fm\  l-^(t)-[l-f(t)]n  }dt,  *here  f  is  the  proba¬ 
bility  density  function  [not  the  author's  notation].  From  the  latter  it 
follows  that  the  expected  viilue  of  the  range  fbi  n=3  is  aiwjr/s  3/2  that  for 
m2,  since  l-f^Cl-f)3  “  (S/2) f  1-f2- (1-f ) 2] . 

r 

E.  S.  Pearson,  H„  J.  Godwin  and  H.  0.  Hartley  (1945)  study  arid  tabulate 
the  probability  integral  of  the  mean  deviation  (f>4i  the  arithmetic  mean)  of 
sanples  from  a  normal  distribution. 

5.  THE  MODERN  ERA  (1946-1972) 

George  A.  Baker  (1946)  studies  the  distribution  of  the  ratios  of  sample 
range  t:  .ample  standard  deviation  in  sanples  from  normal  distributions  and 
from  two  different  combinations  of  tvo  normal  distributions,  one  symmetrical 
but  distinctly  bimodal  arid  the  other  weakly  bimodal  but  strongly  skewed.  He 
tabulates  various  moaent  constants  of  the  distribution  for  various  sample  sizes. 
He  finds  that  the  correlation  between  standard  deviation  and  range  of  the  same 
sample  its  negligible  for  samples  of  size  n*IOO  from  the  normal  population,  but 
not  from  the  combinations, 

George  N.  Brew,  and  John  W.  Tukey  (1946)  study  the  distribution  of  sample 
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mesas  for  samples  from  various  distributions  ,  including  "long  tailed!'  ones , 
for  which  they  find  that  the  distance  between  any  two  percentage  points  of 
the  mean  of  a  sample  of  size  n  is  ultimately  larger  thsn  a  positive  power  of 
n.  They  claim  that  these  results  shew  that  (1)  "the  use  of  the  mean  of  a 
sample  as  a  measure  of  location  *  *  *  implies  a  belief  that  the  tails  of  the 
underlying  distribution  are  net  too  long;  (2)  it  is  probable  that  the  relative 
efficiencies  of  mean  and  median  are  greatly  affectedly  the  length  of  the.t?il". 

A.  George  Carlton  (1946)  shows  thrt  the  range  and  midrange  of  a  sample 

from  a  rectangular  distribution  are  a  pair  of  sufficient  statistics ,  and 

maximum  likelihood  estimates ,  for  tlie  true  range  and  true  mean.  He  derives 

exact  and  limiting  distributions  of  midrange,  range,  and  their  ratio,  and 

calculates  the  'efficiencies’  of  the  sample  mean  and  median  as  estimates  of 

the  true  mean.  The  limiting  distributions  are  non-normal,  with  standard  error 

of  order  n"1  instead  of  the  usual  For  the  one-parameter  rectangular 

distribution  f(x)»l/x,  0«xiA,  he  finds  lhat  the  largest  observation  v  "is  a 

sufficient  statistic  and  is  evidently  the  maximum  likelihood  estimate  of  X". 

Harold  Cramer  (1946)  gives  an  excellent  advanced  treatment  of  the  mathe¬ 
matical  theory  of  statistics,  including  measures  of  central  tendency  (location) 
and  of  dispersion  and  the  method  of  least  squares  and  rival  methods.  Since  all 
measures  of  location  and  dispersion  are  to  a  large  extent  arbitrary,  each 
measure  having  its  own  advantages  and  disadvantages  in  various  cases,  and  since 
the  principle  of  least  squares  is  associated  with  specific  measures  (mean  and 
standard  deviation) ,  Cramer  states  that  there  is  no  logical  necessity  for 
adopting  this  principle.  On  the  contrary,  he  says,  it  is  largely  a  matter  of 
convention  whether  we  choose  to  do  so  or  not,  the  main  reason  in  favor  of  the 
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principle  being  the  relative  sii|>licity  of  the  rules  of  operation  to  which 
it  leads. 

Joseph  F.  Daly  (1946)  proves  that,  for  samples  from  a  normal  population, 
the  mean  end  the  range  (or  any  other  symmetric  function  of  the  sample  variates 
which  is  invariant  under  a  translation  of  the  origin)  are  statistically  inde¬ 
pendent. 

E.  J.  Gunbel  (1946)  shows  that  in  a  sample  of  size  r.  (large)  the  m— 
observation  from  one  extreme  ard  the  k—  from  the  other  in  order  of  magnitude 
may  be  regarded  as  independent  provided  that  m  and  k  are  small  with  respect 
to  n  and  that  the  population  behaves  in  its  tails  in  a  certain  exponential 
manner. 

Mauri  ce  George  Kendall  (1946)  gives  a  thorough  treatment  of  th  theory  of 
linear  and.  curvilinear  regression.  He  points  out  that  the  most  r\c,xtant  use 
of  least  squares  in  statistical  theory  is  in  estimating  the  parameters  (coef¬ 
ficients)  in  regression  equations .  He  also  mentions  its  use  in  estimating  the 
parameters  of  statistical  distributions,  which  will  not  be  considered  in  detail 
in  this  report. 

Frederick  Hosteller  (1946)  suggests  that  certain  "inefficient”  statistics 
may  be  useful  when  data  are  inexpensive  compared  with  the  cost  of  computing 
"efficient"  statistics.  In  particular,  he  proposes  the  use  of  linear  combi¬ 
nations  of  order  statistics,  which  he  calls  systematic  statistics,  to  estimate 
the  mean  and  standard  deviation  of  a  normal  population.  He  compares  the 
efficiencies  of  the  estimates  of  standard  deviation  wiih  those  of  other  estimates 
which  do  not  involve  s’jns  of  squares  or  products,  including  ihe  mean  deviations 
about  the  me; an  and  about  the  median. 
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Frank  Ephraim  Grubbs  and  Chalmers  L.  Weaver  (1947}  study  the  use  of 
group  ranges  to  estimate  tie  population  standard  deviation  fro*  a  sample  fro* 
a  normal  population.  They  tabulate  the  moment  constants  (mean,  standard  devia¬ 
tion  and  a^)  of  the  range  for  samples  of  size  n«2(l)12  fro*  a  normal  popu¬ 
lation. 

E.  Lord  (1947)  proves  that  the  mean  and  the  difference  between  the  p— 
and  q—  order  statistics  of  a  sanrie  of  size  n.  (which  reduces  to  the  range 
when  p-1,  q=n)  from  a  normal  population  axe  independent. 

K.  R.  Nair  (1947)  shows  that  the  standard  error  of  the  mean  deviation  m' 
from  the  median  is  equal  to  or  less  than  that  of  the  mean  deviation  m  from  the 
mem  for  samples  of  3  or  4  from  a  normal  population.  He  suggests  that,  in 
view  of  greater  simplicity  in  calculation,  there  would  be  strong  practical 
grounds  for  using  m'  rather  than  m  if  expressions  for  the  mean  and  variance  of 
in'  and  tables  of  its  probability  integral  were  worked  out  and  if  the  efficiency 
of  m*  relative  to  m  for  sample  size  n>4  were  found  to  be  not  appreciably  worse 
tLm  for  n«4. 

R.  L.  Plackett  (1947)  determines  an  ipper  limit,  independent  of  the  form 
of  the  distribution,  for  the  ratio  dQ  of  the  expected  range  in  samples  of  size 
n  to  the  population  standard  deviation.  This  limit  is  n  {2[(2n-2)l-(n-l)!)2j 
/(2n-l)![1^2,  which  is  approximately  n1^2  for  large  n.  Plackett  finds  distri¬ 
butions  for  which  the  limit  is  attained;  for  n-2,3  the  distributions  are 
rectangular. 

Warren  B.  Purcell  (1947)  proposes  saving  time  in  life  tests  by  using  the 
median  instead  of  the  mean  to  indicate  shifts  in  central  tendency  and  the 
mmimun  value  (first  order  statistic)  instead  of  the  range  to  indicate  shifts 
in  dispersion,  thus  making  it  possible  to  terminate  the  test  as  soon  as  [n/2] 
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+  l  failures  have  occurred,  where  n  is  the  lumber  of  items  placed  on  test,  and 
[n/2]  is  tiie  largest  integer  less  than  or  equal  to  n/2. 

Seaman  J.  Tanenhaus  (1947)  proposes  the  use  of  the  lot  median  or,  better 
still,  the  average  median  of  several  sublots ,  as  the  most  typical  value  of 
abrasion- resistance  of  yams  from  distributions  which  are  decidedly  positively 
skewed,  for  which  the  mean  tends  to  be  atypical,  being  unduly  affected  by  the 
extremes. 

Churchill  Eisenhart,  Lola  S.  Dosing  and  Celia  8  .  Martin  (1948a)  shew  that 
the  abscissa  of  the  (one- tax'; •  e-probability  point  of  the  distribution  of  the 
median  in  random  samples  of  ',ize  n*2m>l  from  any  continuous  distribution  is 
identical  with  that  of  the  P  -  probability  point  of  the  parent  distribution, 
where  2jJ,(n+1)/2  Ci_pe  >n^n  k*  e  and  dj  «  n!/k(n-k)I  is  the  mniber  of 

combinations  of  n  things  taken  k  at  a  time.  Eisenhart,  Darning  and  Martin 
(1948b)  compare  the  e-probability  points,  for  various  values  of  e  and  n<  of 
the  median  with  those  of  the  mean  for  samples  from  normal  (Gaussian) ,  Cauchy 
and  double-exponential  (Laplace’s  first)  distributions  and  with  those  of  thr 
midrange  for  the  rectangular  (uniform)  distribution-  Their  results  give 
numerical  verification  of  the  fact  that  the  mean  is  the  best  average  for  the 
normal  distribution,  the  median  for  tb j  double-exponential  distribution,  and 
the  midrange  for  the  rectangular  distribution,  while  the  median  is  the  best 
of  the  three  considered  for  the  Ce  tchy  distribution. 

G.  W.  Housner  and  J,  F.  Brennan  (1948)  consider  the  problem  of  bivariate 
regression  in  which  both  variables  are  subject  to  error  and  have  a  finite 
nunber  of  means  falling  on  a  line  and  in  which  the  number  of  sample  observa¬ 
tions  taken  about  each  mean  is  known.  They  estimate  the  slope  b  of  the 


95 


regression  line  Y-a+bX  as  the  total  of  tits  differences  of  all  pairs  of  obser¬ 
ved  values  of  the  y's  divided  by  the  like  total  for  the  observed  x's,  and 
show  that  this  estimate  is  consistent.  For  the  case  of  wgioiped  data,  the 
proposed  estimate  reduces  to  $  *  ^^(i-Tj/^jXjCi-i),  where  the  x’s  are 
ordered  according  to  magnitude.  In  a  particular  nunerical  example,  the  authors 
show  that  this  estimate  ccopares  favorably  with  others  that  have  been  proposed. 

K.  R.  Nair  (.1948)  studies  the  distribution  of  the  extreme  deviate  from 
the  sample  mean,  w=Xj_-  x,  where  x^^,'**  ,x^  are  ordered  values  in  a  sample  of 
size  k  from  the  uiit  normal  distribution  and  3  is  their  me  an, as  well  as  the 
distributior  of  its  studentized  form,  w/s,  where  s  is  an  independent  unbiased 
estimator  of  the  population  variance.  He  uses  the  latter  distribution  as  the 
basis  of  a  new  criterion  for  rejection  of  outliers ,  which  he  compares  with  the 
criteria  of  Irw:n,  Tippett,  Student,  McKay  and  Thompson. 

K.  C.  Sreedhi  ran  Pillai  (1948)  determines  the  information  (as  defined  by 

Fisher)  furnished  by  each  order  statistic  x^(i»l,2,***  ,n)  in  a  sample  of  size 

2 

n  from  a  normal  distribution  concerning  the  mean  y  and  the  variance  a  ,  end 
tabulates  results  for  ^2,3,“ *,  12;  i*l,2,**‘,  [n/2]+  1.  Not  surprisingly, 
these  tables  show  that  the  central  values  give  the  most  information  concerning 
y  and  the  extreme  values  concerning  o  .  The  author  determines  a  function  of 
n  which,  when  multiplied  by  the  semirangs  (x^-  x,)/2,  yields  an  unbiased 
estimator  of  o,  and  studies  the  distribution  of  the  semirange. 

K.  R.  Nair  (1949) ,  in  a  follw-up  of  his  previous  note  [Nair  (1947)  ]  on 
the  mean  deviations  frail  the  meiian  and  from  the  mean,  and  their  use  in  .esti¬ 
mating  the  standard  deviation  cr  of  a  normal  population,  shows  that  the  coef¬ 
ficients  of  variation  of  the  two  mean  deviations  are  almost  the  same  for  samples 
of  size  n  when  2$n$10, 
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W.  R.  Purcell  (ly49)  elaborates  on  the  use  of  the  median  life  and  the 
shortest  life  instead  of  the  mean  life  and  the  range,  as  proposed  in  ids 
earlier  paper  [Purcell  (1947)},  and  gives  an  example  of  their  successful  use 
in  saving  time  in  life  tests  on  incandescent  lamps. 

K.  J.  Shone  (1949)  studies  the  use  of  the  sample  range  in  estimating  the 

standard  deviation  of  ncnnormal  populations ,  Let  o,  f,  and  N  represent  the 

sample  standard  deviation,  the  mean  range,  the  standard  deviation  of  the  range, 

2  2  2 

and  the  sample  size,  respectively.  For  N*2,  he  finds  that  2cr  »r  +  or  for  all 
populations  whose  variance  is  finite;  for  N*3 ,  r/a  ■=  2,10-0.81  a^/F  for 
eighteen  discrete  unimodal  distributions;  for  N*4  and  N»5,  respectively, f/o  * 
2.29-0.69cr ^/f  and  F/o*  2.41-0.46  or^/f  for  five  selected  populations  of  extreme 
form. 

John  Wilder  Tukey  (1949a,b,c,d)  and  Theodore  E.  Harris  and  Tukey  (1949> 
report  on  a  study  of  sampling  from  contaminated  distributions.  Tukey  (1949a, 
c,d)  studies  the  relative  efficiencies  and  effectivenesses  (in  large  samples) 
of  various  estiuiation  procedures  when  the  distribution  differs  from  normality 
in  the  direction  of  long  tails  (resulting  from  a  mixture  of  two  normal  distri¬ 
butions  with  the  same  mean  and  different  standard  deviations) .  Harris  and 
TiJcey  (1949)  and  Tukey  (1949b)  consider  the  relative  efficiencies  of  estimators 
obtained  fret  the  mean  and  standard  deviation  by  removing  the  extreme  y  I  at 
each  end  of  the  sanpls,  for  varying  degrees  of  contamination,  both  in  large 
and  in  moderately  large  samples. 

William  John  Youden  (1949)  exposes  the  fallacy  in  the  common  practice  of 
making  three  measurements ,  averaging  the  two  values  closest  together  and 
discarding  the  other.  Intuition  suggests  that  if  two  of  the  three  measurements 
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are  in  dose  agreement  while  the  third  is  considerably  removed  from  either  of 
the  others,  then  there  may  be  grounds  for  suspecting  and  perhaps  rejecting  the 
third  value.  Analysis  shows,  however,  that  for  samples  of  three  from  a  normal 
distribution,  one  of  the  measurements  will  be  at  least  19  times  farther  away 
from  its  neighbor  than  the  distance  separating  the  two  closest  in  one  sample 
out  of  twelve;  hence  it  appears  that  measurements  that  should  be  retained  are 
often  discarded. 

Wilfrid  J.  Dixon  (1950)  proposes  new  criteria,  based  on  the  ratio  of  the 

differences  of  two  pairs  of  order  statistics ,  for  rejection  of  outlying 

observations.  He  compares  the  performance  of  these  criteria  with  those  of 

Irwin,  McKay,  Thompson,  Nair,  and  Grubbs  for  detecting  contamination  of  sarap- 

les  from  a  normal  population  with  mean  v  and  variance  a  ,  N(u,o  ) ,  by  one  or 

2  2  2 

more  observations  from  (a)N(u  +  Xcr ,o  )  or  (b)N(ii,A  a  ). 

Grubbs  (1950)  also  proposes  a  new  criterion  for  rejection  of  outliers , 
the  criterion  being  the  ratio  of  the  sums  of  squares  of  deviations  from  the 
mean  for  the  truncated  sanple  (with  the  observation  or  observations  in  question 
omitted)  and  for  the  complete  staple.  He  obtains  and  tabulates  the  distribu¬ 
tion  of  this  ratio  for  one  extreme  observation  and  for  two  extreme  observations 
both  at  the  same  end;  he  does  not  examine  the  criterion  for  one  extreme  at  each 
end. 

Theodore  E.  Harris  (1950)  gives  a  simple  explanation,  with  a  numerical 
example,  of  a  procedure,  essentially  that  of  Edgeworth  (1923)  and  Rhodes  (1930), 
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for  fitting  a  regression  line  Y«*a*bX  by  minimizing  the  sun  of  the  absolute 
deviations  rather  than  the  sun  of  squares  of  the  deviations.  He  points  out 
the  relation  between  this  problem  and  linear  programing. 

F.  M.  Henry  (1950)  studies  the  loss  of  precision  from  discording  discrep¬ 
ant  data.  In  particular;  he  ^lies  the  rule  of  Goodwin  (1913) :  "When  the 
nuaber  of  observations  is  small,  reject  any  observation  that  deviates  more  than 
4  A.D.  from  the  sample  mean,  the  mean  and  A,D.  [average  deviation]  being  com¬ 
puted  with  tiie  omission  of  the  doubtful  observation”  to  series  of  five  measure¬ 
ments  of  a  time  interval  with  a  stop  wat-A,  and  the  "best  two  out  of  three"  pro¬ 
cedure  to  series  of  three  such  measurements.  In  both  cases  he  finds  that 
use  of  the  procedure  results  in  an  increased  rather  than  a  decreased  error; 
in  the  case  of  three  measurements,  not  only  the  average  of  all  three  measure¬ 
ments,  but  also  the  average  of  the  two  extreme  measurements  (the  midrange) 
gives  better  results. 

E.  S  Pearson  (1950)  investigates  the  estimation  of  the  standard  deviation 
of  a  population  from  the  range  of  a  sample  of  size  n  or  the  mom  range  of  N 
items  divided  into  m  groups  of  n  items  each.  Even  when  (a)  the  population  is 
not  normal  or  (b)  the  sample  includes  one  or  more  outliers ,  he  concludes  that 
use  of  the  range,  with  adjustment  appropriate  for  a  normal  pqpulation,  is 
justified  provided  n<10. 

K.C.S.  Pillai  (1950)  finds,  in  a  form  suitable  for  numerical  calculations, 
the  distributions  of  the  midrange  and  the  semirange  an-i  their  joint  distribu¬ 
tion  for  sanples  of  size  n  from  a  standard  normal  population,  N(0,1). 
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G.  R.  Seth  (1950)  finds  the  joint  distribution  of  the  two  closest  obser¬ 
vations  x,,x"(x,<x")  of  tiie  set  given  the  distribution  of 

Xj,X2»Xj;  he  also  finds  the  joint  distribution  of  u«(x,,-xf)  and  vp^x*'-*')/ 
(Xj-Xj)  in  general,  and  the  joint  density  function  of  u  and  w  and  the  marginal 
density  factions  of  u  and  of  w,  all  when  the  mderlying  distributicm  is  normal 
with  mean  e  and  variance  unity,  N(e,l).  He  also  obtains  the  joint  vsr-^ty 
function  of  u*x"-x'  and  v"(x'+x")/2,  as  well  as  the  marginal  density  function 
of  v,  which  has  mean  e  and  variance  1/2  +/3/4w. 

R.  K.  Zeigler  (1950)  shows  that  t  for  a  randan  sample  of  size  2k+l  from  a 
distribution  which  has  a  finite  second  moment  and  which  is  continuous  at  x=G 
with  f(e)#0,  e  being  the  population  medial,  the  joint  distribution  of  the 
sample  median  and  the  mean  deviation  from  the  sample  median  is  asymptotically 
bivariate  normal,  and  gives  the  asymptotic  means,  variances,  and  correlation 
coefficient. 

D.  H.  Bhate  (1951)  shows  that,  for  symmetri 'al  probability  functions 
which  are  members  of  the  Pearson  family,  the  mean  of  two  symmetrically  placed 
elements  in  an  ordered  sample  (a  qtssi -median  or  quasi-midrange)  is  more 
efficient  than  the  median  as  an  estivate  of  the  central  value.  He  demonstrates 
by  m  example  that  this  statement  is  not  true  for  all  symmetrical  probability 
functions . 

Brown  and  Mood  (1951)  propose  a  method  based  on  medians  for  determining 

the  coefficients  in  a  multiple  linear  regression  equation,  let  the  dependent 

k 

variable  y  be  distributed  with  median  0q+  zr  and  suppose  .ve  have  >  sample 

of  n  sets  of  associated  observations  z^,  z2i,  ***,  with  n. 

Then  the  coefficients  are  estimated  by  the  numbers  •of  such  that 

r  v 


median 

z  <z 
n  r 
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,k,  v&ere  zr  is  the  median  of 


ri  r 

file  n  observations  z  . . 

n 

Dixon  (1S51)  finds  the  distribution  of  the  ratio  i-C^-a^p/C^-a^)  for 
sane  small  values  of  i  and  j,  where  are  **»©  order  statistic*  cf 

a  sample  of  size  m30  from  a  population  which  is  (1)  rectangular  or  (2)  normal. 
He  tabulates  3-dedmal -place  percentage  points,  corresponding  tc  cumulative 
probability  a=».005,.01,.02,.05,.l(.l).9,.95,  for  r  when  j*l,2  and  i-1,2,3,  for 
samples  of  size  n*(i+j+l)(l)30  from  a  normal  population.  These  tabular  values 
are  useful  in  applying  the  criteria  for  rejection  of  outliers  proposed  by  the 
author  in  his  earlier  paper  [Dixon  (1350)]. 

H.  0.  Hartley  and  E.  S.  Pearson  (1951)  tabulate  the  moment  constants  of 
the  distribution  of  the  range  in  samples  of  size  n»2(l)20  dram  from  a  normal 
population  with  unit  variance.  They  note  that  there  are  sane  discrepancies 
between  their  table  and  some  earlier  results  of  Gnbbs  §  Weaver  (1947) . 

Ray  Bradford  Murphy  (1951)  treats  the  problem  of  outlying  observations 
in  samples  from  univariate  normal  populations  as  one  in  linear  hypotheses.  In 
particular,  he  introduces  t- tests  for  outliers  from  a  single  mi  verse  and 
likelihood  ratio  tests  for  outliers  from  several  universes.  He  discusses  the 
problems  of  testing  for  all  possible  numbers  of  outliers,  k,  subject  only  to 
the  restriction  that  2k<n,  where  n  is  the  sanple  size. 

J.  H.  Cadwell  (195 '0  finds  inproved  approximate  formulas  (polynomials  in 
n  *)  for  the  ratio  of  the  standard  error  of  the  median  to  the  standard  error 
of  the  mean  for  random  samples  of  size  n  drawn  from  a  normal  population.  Numeri¬ 
cal  exaples  shew  good  agreement  with  the  exact  results  of  Hojo  (1931) .  The 
author  shows  to  extend  the  result  so  as  to  obtain  the  corresponding  ratio 
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for  a my  quantile  of  a  sample  from  a  continuous  population. 

Anders  Kald  (1952a, b)  gives  theory  and  tables  for  the  distribution  of  the 
raige.  He  also  discusses  the  distributions  of  foe  largest  observation  and  of 
its  deviations  from  foe  population  mean  and  foe  sample  mean,  as  ’Jell  as  criteria 
for  the  rejection  of  outlying  observations. 

Tsurodiiyo  Haraa  (1952)  obtains  possible  limit  lws  for  foe  range  and 
foe  midrange  of  samples  from  a  continuous  population,  rod  shows  that  foe  range 
and  the  midrange  axe  not  asymptotically  independent. 

Norman  L.  Johnson  (1952)  gives  an  approximation,  valid  for  w  small  and  n 
not  too  large,  for  foe  probability  PQ(w)  that  foe  range  of  n  independent  random 
variables  does  not  exceed  w.  He  also  gives  an  approximation  for  foe  critical 
values  wa  satisfying  ?n  (wo)*a. 

Julius  Lieblein  (1952)  investigates  foe  distributions  of  several  statistic*: 
involving  foe  closest  pair  of  observations  in  a  sample  of  site  three  from  rec¬ 
tangular  and  normal  populations,  and  calculates  their  means  and  standard 
deviations.  He  shows  that  the  ratio  of  the  difference  of  the  closest  pair  of 
observations  to  the  range  is  a  poor  criterion  for  rejecting  outlying  observa¬ 
tions,  and  finds  the  distribution  of  the  outlying  observation  for  a  rectangular 
population. 

K.  R.  Nair  (1952)  extends  his  earlier  table  [Nair  (1948)]  of  percentage 
points  of  the  studentized  extreme  deviate  from  the  sanple  mean  to  cover  more 
sample  sizes  and  more  significance  levels. 

Caiyampudi  RadhakrxJma  Rao  (1952)  gives  exanples  involving  foe  use  of 
foe  largest  and/or  smallest  observations  to  estimate  foe  parameter (s)  of  a 
rectangular  population,  foe  asymptotic  distribution  of  quantiles , and  foe 
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efficiency  of  the  sap’e  medial  as  an  estimate  of  the  mean  of  a  normal  popu¬ 
lation. 

J.  H.  Cacbell  (1953)  presents  a  Method  for  evaluating  the  probability 

as. 

density  function  of  the  Is  quasi- range,  wr«  of  a  sample  of  size 

n  fits  a  normal  population,  and  tabulates  percentage  points  and  aonents  of 
Wj  for  ir>10(l)30.  He  investigates  the  efficiency  of  quasi-ranges  in  esti¬ 
mating  the  population  standard  deviation,  and  finds  w^(the  range)  to  be  most 
efficient  for  2*n<17  and  most  efficient  for  18<n*31. 

Dixon  (1953)  studies  the  problem  of  contamination  of  a  saaple  supposed 
to  be  dram  from  a  normal  papulation  with  mean  p  and  variance  o2,  N(p,o*) ,  by 
drawing  a  proportion  y  of  the  observations  from  either  N(p+Ao,a  )  or  N(p,A  cr) . 
He  discusses  the  estimation  of  v  oy  use  of  the  mean  and  the  median,  the  esti- 
nation  of  o  (or  o)  by  the  swple  variance  and  the  range,  and  gives  reacaended 
rules  for  processing  data  under  various  conditions  of  contamination. 

Enoch  B.  Farrell  (1953)  proposes  the  construction  of  quality  control 
charts  using  ranges  and  midranges  within  subgroups  and  medians  of  these  statis¬ 
tics  between  subgroups.  He  contends  that  this  method  gives  more  useful  estimates 
of  the  true  population  parameters  than  the  conventional  method  when  outlying 
observations  due  to  the  presence  of  assignable  causes  of  variation  are  present, 
also  that  it  is  more  effective  in  detecting  and  locating  assignable  causes,  be¬ 
sides  involving  simpler  computations. 

Harman  Leon  Harter  (1953)  applies  the  principle  of  msedmun  likelihood 
to  the  problem  of  determining  the  regression  equation  of  one  variable  on  p 
others.  He  shews  that  for  a  normal  distribution  of  residuals,  the  maximum 
likelihood  solution  is  the  least  squares  solution,  found  by  minimizing  the 
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stas  of  the  squares  of  the  residuals,  while  for  a  Laplace  (first)  distribution 
of  residuals,  the  maxim  livelihood  solution  is  fbtnd  by  minimizing  the  sia 
of  the  absolute  values  of  the  residuals.  For  distributiccs  of  residuals  with 
finite  Halts,  only  certain  solutions  are  acMssible,  a&i  either  of  the  ibove 
methods  may  lead  to  an  inadmissible  solution .  For  a  rectangular  distribution 
of  residuals,  the  likelihood  function  is  a  constant,  and  there  is  no  *«*’ 
maxim  likelihood  solution,  one  achnssihle  solution  being  just  as  likely  as 
another. 

£.  P.  King  (1953)  shows  that,  when  the  criteria  of  Grubbs  (1950)  and  Dixon 
(1951)  are  employed  to  detect  the  presence  of  a  single  outlier,  the  effect  of 
using  a  test  statistic  based  on  the  note  deviant  of  the  two  extremes,  thus 
testing  a  two-sided  hypothesis,  is  approximately,  but  not  exactly,  to  double 
the  significance  level  of  the  standard  test  procedure. 

Edwin  Glenn  Olds  (1953)  studies  the  problem  of  finding  the  coefficients 
a  and  b  in  the  equation  of  the  best-fitting  straight  line  Y*a+bX  when  valves 
of  y  are  observed  corresponding  to  a  fixed  set  of  x- values.  He  points  out  that 
when  y  has  a  normal  distribution  with  constant  variance  for  each  x,  the  solu¬ 
tion  can  be  fomd  either  by  the  method  of  least  squares  or  by  the  method  of 
maximum  likelihood.  When  y  has  a  rectangular  distribution,  the  method  of 
maximum  likelihood  does  not,  in  general,  give  a  unique  solution,  and  the  method 
of  least  squares  sometimes  yields  a  solution  which  is  inconsistent  in  that  the 
residual  (the  difference  between  the  predicted  and  observed  y- values)  for 
one  or  more  x-values  may  lie  outside  the  admissible  interval  (-c,+c) ,  where 
2c  is  the  range  of  the  rectangular  distribution  assured  for  the  residuals,  the 
author  adopts  the  leasv  squares  solution  whenever  it  is  consistent;  when  it  is 


not,  be  shots  how  to  find  a  Modified  least  squares  solution  which  Minimizes 
the  sisa  of  squares  of  the  residuals  subject  to  the  restriction  that  the 
absolute  value  of  each  residual  must  be  less  than  or  equal  to  c. 

Fran}  Frosdian  (1953)  advocates,  for  die  rejection  of  outlying  observa¬ 
tions,  Dixon's  criterion  based  on  the  extreme  observation  when  no  past  data 
are  available  and  Hair's  criterion  based  on  the  studenti  zed  extreme  deviate 
when  past  data  are  available  for  use  in  obtaining  an  independent  estimate  of 
the  standard  deviation  of  an  individual  measurement.  He  tabulates  critical 
values  at  die  5%  and  It  levels  for  both  tests. 

/ouden  (15 S3)  sumnarizes  available  results  on  the  situation  in  which  three 
measurements  are  made  and  the  two  showing  the  best  agreement  are  selected.  Hie 
difference  between  the  selected  measurements  averages  about  four-tenths 
(3-3V§/2)  that  of  the  difference  for  honest  duplicates  [Lieblein  (1952)].  The 
dispersion  of  the  average  of  the  selected  pair  is  12  per  cent  larger  than  that 
of  the  average  of  dip li cates.  Let  d  be  the  difference  between  the  selected 
pair  and  D  the  difference  between  the  discarded  measurement  and  the  nearer 
of  the  selected  ones.  The  interval  D  is  ten  or  more  times  as  large  as  d  in 
15.7  percent  of  sets  of  three  measurements.  More  than  one-third  of  the  time, 

D  is  at  least  four  times  as  large  as  d.  Values  of  the  ratio  D/d  exceed  32.57 
once  in  twenty  times  [Youden  (1949)]. 

Cackell  (1954)  gives  an  asymptotic  expression  for  the  probability  integral 
of  the  range  for  samples  from  a  symmetric  inimodal  distribution,  and  investigates 
its  accuracy  for  the  case  of  samples  of  size  20  to  10C  from  a  normal  population. 
For  this  range  of  sample  sizes  the  errors  are  small,  and  they  can  be  made  less 
than  C .0001  by  using  a  correction  based  on  values  given  in  the  paper.  The  author 


tabulates  percentage  points  of  the  range  for  samples  of  size  n-20,40,60,80  and 
100  from  a  normal  population. 

David  R.  Cox  (1954)  studies  the  mean  range  and  the  coefficient  of  variation 

of  tiie  raoge  in  samples  of  size  2,5,4,  and  5  fraa  different  types  of  pcpula- 

2 

ticcs  covering  a  wide  range  of  values  of  aad  62*a4»  the  measures  of 
skewness  and  kurtosis,  including  sy we  trie  and  asywetric  matures  of  noxaal 
distributions,  the  nonaal  distribution, the  rectangular  distribution,  exponential 
type  distributions,  the  Pearson  system,  and  Shane's  numerical  results  for  five 
discrete  distributions.  He  tabulates  the  normalized  aean  range  and  the  coef¬ 
ficient  of  variation  of  the  range  to  5  decimal  places  for  samples  of  size  2, 

5,4  and  5  for  62*1.0(.2)2.0(.S)5.0(1.0)9.0;  Is  301  a  determining  factor.He 
coa^yares  the  distributions  of  the  range  for  sables  from  exponential  and  normal 
populations,  and  applies  the  results  to  estimation  of  dispersion  by  use  of 
the  range. 

Herbert  A.  David  (1954)  finds  the  emulative  distribution  fraction  and 
the  expected  value  of  the  range  of  saaples  from  five  non-normal  populations, 
and  makes  numerical  comparisons  of  the  results  with  the  corresponding  ones  for 
sample^  from  a  normal  population. 

H.  A.  David,  H.  0.  Hartley  and  E.  S.  Pearson  (1954)  approximate  the 

y  j.  y 

distribution  of  u»w/srwhere  x^n,  (n-l)s  -  ^(a^-  TO  ,  and 

is  a  ranckxn  sarple  with  mean  x  from  a  normal  population]  by  selecting  a  curve 
from  the  Pearson  system  with  the  proper  first  four  moments.  For  specific  values 
of  n  they  ooqpare  the  result  with  that  of  an  exact  alternative  derivation.  After 
examining  certain  non-normal  populations,  they  suggest  that  u  may  be  useful  in 
detecting  departures  from  normality. 
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Gabel  (1954)  derives  the  continuous  emulative  distribution  function 
with  specified  mean  and  variance  for  idiidi  the  expected  value  of  the  largest 
of  n  independent  observations  is  a  maximum  and  the  continuous  c.d.f  vith 
specified  variance  for  tdiich  the  nean  range  is  a  maxima.  The  latter  result, 
-obtained  by  a  different  method,  was  previously  given  by  R.  L.  Pladcett  (1947). 
The  former  result,  obtained  independently,  is  given  by  Hartley  and  David  (1954) , 

|T, 

who  also  obtain  an  qpper  bound  for  the  expected  value  of  *he  m~  order  statistic 
and  best  ipper  and  lower  bunds  of  the  saaple  range  of  x  uvder  the  restrictions 
that  the  man  and  variance  of  x  are  0  and  1  respectively  and  values  of  x  are 
restricted  to  tb“  closed  interval  [a,b] ,  where  a  and  b  axe  given  constants. 

They  derive  the  distributions  for  which  the  upper  bounds  are  attained,  and 
shor  that  the  lower  botci  is  attained  for  a  discrete  distribution  where  x  may 
assuae  only  two  values.  These  results  are  of  interest  in  assessing  the  bias 
that  may  result  from  the  unwarranted  assumption  of  normality  when  using  the 
sa^>le  range  to  estivate  the  papulation  standard  deviation. 

E.  S.  Pearson  and  H.  0.  Hartley  (1954)  give  tables  of  moment  constants, 
probability  integral  and  percentage  points  of  the  range;  also  taoles  of  per¬ 
centage  points  of  the  extreme  standard  zed  deviate  from  the  population  mean  and 
from  the  saaple  mean,  the  extreme  student! zed  deviate  from  the  staple  mean,  and 
the  ratio  of  range  to  standard  deviation  in  the  same  sample,  ail  for  saqples 
from  a  normal  population.  Various  applications  of  these  tables,  including  the 
rejection  of  outlying  observations,  are  discussed  in  the  introduction. 

George  J.  Resnikoff  (1954)  discusses  various  epproxiaations  to  the  distri¬ 
bution  of  the  average  range,  and  tabulates  percentage  points  of  the  average 
range  for  subgroups  of  size  five,  commonly  used  in  quality  control  work,  for 
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samples  of  size  N“5m,  where  *  is  an  integer. 

F.  Zitek  (1954)  discusses  various  measures  of  sa^>le  dispersion,  includ¬ 
ing  standard  deviation,  mean  deviation,  and  range,  which  may  be  used  in 
estimating  the  standard  deviation  of  a  normal  population.  Bor  n=2 (1)  15 ,  be 
tabulates  normalizing  factors  which  make  these  estimators  unbiased  and  variances 
of  the  resulting  unbiased  estimators  whenever  these  are  available  in  the  liter¬ 
ature,  and  makes  sone  observations  on  the  efficiency  of  the  estimators. 

John  T.  Chu  (1955a)  obtains  upper  and  lower  bounds  for  the  cur-dative 

distribution  fuiction  of  the  median  f  of  a  sample  of  (2n+l)  observations  an  a 

random  variable  X  from  a  papulation  with  probabil  ity  density  fuiction  f(x)  and 

unique  median  £.  He  shows  that  the  approach  to  normality  of  the  distribution 

of 'x  is  rapid  when  X  is  normally  distributed,  but  much  slower  when  X  has  a 

rectangular  or  a  Laplace  (first)  distribution.  Chu  (1955b)  shows  that,  aider 

2-1 

very  general  conditions s  var  y.  ^{4[f(^)3  (2n+3)  >  ,  as  ccupared  with  the  asy- 

2  -1 

nptotic  variance  {4[f(?)]  (2n+l)}  ,  with  the  equality  holding  for  the  rec¬ 

tangular  distribution.  He  shows  that  the  sample  mean  7.  is  more  efficient  (has 
smeller  variance)  than  x  for  maty  symmetric  distributions,  notable  exceptions 
being  the  Laplace  and  Cauchy  distributions. 

Chu  and  Harold  Hotelling  (1955)  shew  that,  aider  certain  regularity 
conditions,  the  central  moments  of  the  sample  median  are  asymptotically  equal 
to  tie  corresponding  moments  o2  the  asynptotic  distribution,  which  is  normal. 
They  give  a  general  approximation  procedure  for  the  monents  of  the  median  which 
involves  expanding  the  inverse  of  the  cumulative  distribution  fmction  in  a 
Taylor  series;  the  approximation  error  can  be  made  arbitrarily  small  by  using 
a  sufficiently  large  number  of  terms  in  the  expansion.  They  apply  the  method 
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to  the  normal,  Laplace,  and  CaaiV  distributions;  for  the  first  two  of  these 
they  obtain  ipper  and  lower  boccds  for  the  variance  of  the  median  by  a  mxh 
simpler  procedore.  They  obtain  detailed  results  concerning  the  «dians  of 
samples  dram  frost  a  normal  population. 

J.  Arthur  Greemood  (1955)  expresses  the  differential  of  the  probability 
of  the  a-th  range  [quasi-range]  in  terse  of  Bessel  fractions  of  the  third  kind, 
and  integrates fcy  parts  to  obtain  the  distribution  of  the  s-!h  rroge. 

Max  KalperLi,  Samuel  If.  Greenhouse,  Jerome  Cornfield  an a.  Julia  Zalokar 

(1955)  tabulate,  to  three  significant  figures,  the  upper  and  loner  51  and  11 

points  of  the  studntized  maxima  absolute  deviate  dreax^jx^-  5c|/s,  where  the 

x^(i»l,*'* ,k)  are  independent  and  each  N(u,o^),  and  where  ms2/o^  is  distributed 
2 

as  x  with  a  degrees  of  freedom  and  independent  of  x^,  for  1^3(1)10(5)20(10)40, 
50  and  ■“3(1) 10  (5) 20 (10)  40 ,60 ,120 .  They  give  examples  to  illustrate  the  use 
of  the  tables  for  various  purposes ,  including  m  outlier  test  which  is  tLe 
two-siued  version  of  Nair's  test. 

George  Willi  an  HiaBjpron  (19551  shows  that  bawds  exist  fb-  w/s,  the  ratio 
of  range  to  standard  deviation  in  the  sane  sanple  of  size  n,  for  all  popula¬ 
tions  with  non-zero  variance.  He  tabulates  upper  and  lower  bounds  for  w/s  to 
three  decimal  places  for  n*3(l) 20 (10) 60 (20) 100 (5U) 209, 500 ,1000;  also  lower 
and  upper  0.11,  0.5%,  1.01,  2.51,  5.01  and  10.01  points  and  the  medial  (501 
point)  of  w/s  to  five  decimal  places  for  sanples  of  size  n»3  from  a  normal 
population. 

Tukey  (1955)  shows  that  various  characteristics  (e.g.  percentage  points, 
expectation ,  reciprocal  standard  deviation)  of  the  range  of  sanples  from  a 
normal  population  behave  asymptotically  like  the  square  root  of  a  log  (bn+c) 
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where  n  is  the  saple  size  sod  a,o,c  are  appropriate  constants.  He  tees  this 
fact  as  an  aid  in  intexpolatisg  between  tabular  values  of  these  characteristics. 

H  O.  Behrens  (1956)  proposes  certain  factors  for  use  in  determining  the 
standard  deviation  approximately  either  from  tee  mean  deviation  (taken  from 
the  mean)  or  from  tee  range.  He  coapares  these  factors  mite  those  developed 
by  other  authors  [including  Tippett  (1925),  Pearson  (1932)  and  Pearson,  Godwii. 

§  Hartley  (1945)],  and  refers  to  tee  different  bases  of  the  two  kinds  of  factors. 
He  contends  that  uis  factors  are  useful  for  the  objectives  generally  pursued 
by  expericent  stations. 

Juan  Be  jar  (1956)  defines  tee  median  regression  curve  of  a  bivariate 
distribution  f(x,y)  as  tee  locus  y*g(x)  or  tee  median  of  the  conditional  distri¬ 
bution  f(ytaO  >  and  gives  its  general  properties.  Since  g(x)  is  not  easy  to 
obtain,  tee  author  introduces  the  1  inear  regression,  and  teen  tee  polynomial 
recession,  which  minimize  tee  mean  deviation  instead  of  tee  mean  square  de¬ 
viation  as  in  the  sea*  regression  curve.  He  points  out  that  tee  calculation 
of  the  median  regression  involves  rdnimzing  a  linear  expression  with  the 
variables  constrained  by  inequalities ,  and  is  therefore  closely  related  to 
linear  programming. 

Chester  I.  Bliss,  William  G.  Cochran  and  John  W.  Tukey  (1956)  propose  a 
new  criterion,  based  on  the  range,  for  the  rejection  of  outliers.  The  test 
statistic  is  the  largest  range  Jn  k  sets  of  n  measurements  divided  by  the 
sum  of  all  the  ranges.  If  the  observed  ratio  exceeds  the  51  critical  value, 
which  they  tabulate  for  various  values  of  k  and  n,  they  conclude  thac  the  set 
having  the  largest  range  contains  an  outlier,  which  is  identified  by  inspection 
and  rejected. 
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H.  A.  DWrid  (1956)  gives  tables  of  the  upper  percentage  points  of  the 
stodentized  extreme  deviate  from  the  sample  mean  like  those  of  Nair  (1948, 

1552)  [reprinted  by  Pearson  6  Hartley  (1954) ) , but  corrected  by  using  a  better 
approximation. 

Akio  Kudo  (195-6)  proposes  a  new  criterion  for  the  rejection  of  outlying 

observations.  Given  three  sets  of  independent  observations  {x}:  (i)i^  from 

N(n, .a2) ,  i*l,2,"*,  Ujj  (ii)  fnxi  N(m^,a2);  and  (iii)n^  from  N(»^,a2). 

(toe  of  sv*  1  possible  decisions,  (accept  H^) ,  is  to  be  made,  jfhere 

m2*  ***  ■  ■  m^,  H^fidO):  M^»  each  b^  (except  *j)*  a.-  A.  The  author 

presents  a  decision  procedure  for  which:  Pr(acc.  D0|Hg)  *  1-p,  Pr(acc.  %iHi) 

is  maodmized  for  idO.  The  optimum  decision  procedure  involves  x^»  max  (x^^, 

x,  the  mean  of  saoples  (i)  and  (ii);  S,  the  overall  standard 

deviation  using  X  for  (i)  and  (ii)  and  Xj  for  (iii) .  The  decision  rule  is: 

select  DQ  if  (a^-  X)/S<ip,  mad  select  if(^j-JQ/S>x  .  If  a  is  known,  s  is 

replaced  by  a;  in  this  case  set  (iii)  is  not  needed,  and  different  X  ire 

P 

reeded.  In  each  case,  the  author  states  that  the  critical  miues  are  to 

i- 

be  published  later  [see  Kudo  (1958)]. 

Harold  Ruben  (1956)  shews  that  the  product  moments  of  the  extreme  order 
statistics  in  sanples  of  even  sizes  from  normal  populations  can  be  expressed 
as  linear  functions  of  the  products  of  the  contents  of  certain  hyperspherical 
sixplices,  and  uses  t»^s  feet  to  obtain  simple  explicit  expressions  for  the 
variance  of  the  sanple  range  for  staples  of  size  2  and  4. 

Juan  Bej ar  (1957)  gives  a  method,  similar  to  linear  programming,  to  deter¬ 
mine  the  regression  line  y*a+bx  such  that  £  jy^-a-bx^  j  is  a  miitinnxu  or  the 
regression  plsne  z»a*bx+cy  such  that  jFjz^-a-bx^-cy^J  is  a  minimum.  He  gives  two 


exwvlcs  in  which  he  arranges  the  data  to  sake  the  shortest  calculations. 

njjxon  (1957)  discusses  several  simple  estimates  of  the  mean  and  standard 
deviation  of  a  normal  population.  Estimates  of  the  mean  considered  are  the 
median,  the  midrange,  the  mean  of  the  best  two  (in  the  sense  of  mininnm 
variance),  and  the  mean  of  all  but  the  largest  and  smallest.  Estimates  of  the 
standard  deviation  studied  are  various  linear  combinations  of  quasi-ranges. 

The  efficiencies  of  these  estimates  are  compared  with  those  of  the  sample  mean 
and  sample  standard  deviation  and  the  best  linear  unbiased  estimates  for 
samples  of  size  n*2(l)2C. 

A.  Ghosal  (1957)  derives  formulas  for  the  distribution  of  the  r^-quasi- 
range  Wr®  xn_r-x^fi  of  samples  of  size  n  from  rectangular  and  exponential 
distributions;  tabulates  their  first  four  moment  constants  for  i*0,l,2  and 
n-5, 10,15, 20;  and  compares  the  efficiencies  of  Wr(r>0)  and  Wq  as  estimators 
of  the  population  standard  deviation.  For  the  exponential  distribution,  he 
finds  that  is  more  efficient  than  Wq  for  n*-  9  and  W2  is  more  efficient 
than  Wj  for  n»17. 

B,  4.  Harley  and  E.  S.  Pearson  tabulate  the  probability  integral  and 
percental  points  of  the  range  for  samples  or  size  n«200  from  t,  normal  popu¬ 
lation.  Ihey  indicate  that  the  results  will  be  useful.  in  connection  with  a 
suggestion  by  David,  Hartley  §  Pearson  (1954)  that  a  comparif  an  of  the  range 
and  root-)aean-squar3  estimators  of  the  population  standard  deviation  may  serve 
as  a  test  of  homogeneity  or  as  a  routine  check  of  accuracy  in  computation  and 
also  in  crjnection  with  methods  of  interpolation  suggested  by  Tukey  (1955) . 

Motosaburo  Masuyama  (1957)  derives  ipper  and  lower  bounds  on  the  ratios 
of  the  population  standard  deviation  a  to  the  expectation  of  the  sample  range 


112 


2 

and  of  the  population  variance  o  to  the  expectation  of  the  square  of  Hie 
simple  range, and  suggests  that  the  harmonic  aean  of  the  appropriate  pair  of 
these  bounds  may  be  used  for  all  distributions  as  a  Multiplier  of  the  sample 
range  or  its  square  in  estimating  o  or  o  . 

Rider  (1957)  studies  the  distribution  of  the  sddranges  of  samples  frc® 
five  syanetric  populations  of  limited  range  and  the  relative  efficiencies  of 
sample  midrange  and  mean  in  estimating  the  population  midrange  (which  is 
identical  with  tie  population  mean  and  median) .  He  finds  that  tie  midrange 
is  more  efficient  than  the  mean  for  all  of  the  populations  considered  (which 
have  standardized  fourth  moment  ■  2.19,2.14.15,1.19,1),  and  that  its 
efficiency  increases  with  decreasing  a^. 

Masaaki  Sibuya  and  Hideo  Toda  (1957) ,  using  an  expansion  formula  given 
by  Cadwell  (19530 ,  tabulate  (to  four  decimal  places)  the  probability  density 
function  of  the  range  w  in  normal  samples  of  size  n»3(l)20  for  w«0 (0.05) 7.65. 

S.  Babcock,  A.  Bede,  A.  Davies,  B.  Goldsmith  and  E.  Torkelson  (1958) 
introduce  the  median,  quasi-range  method  for  control  of  lot  average  and  lot 
standard  deviation  for  measurable  lot  quality  characteristics  which  are 
normally  distributed.  They  tabulate  factors  for  computing  upper  and  lower 
acceptance  limits  for  the  median  and  an  upper  acceptance  limit  for  the  optimal 
quasi-range  for  samples  of  size  n-5(5)50. 

D.  E.  Barton  and  D.  J.  Casley  (1958)  propose  a  quick  estimate  of  the 
linear  regression  coefficient  of  y  on  x  in  a  bivariate  sample  (x^  ,y^) ,  1*1,2, 

’  *  * ,  n,  which  they  obtain  by  dividing  the  difference  of  the  means  of  the  k 
largest  and  the  k  smallest  of  the  x’s  into  the  difference  of  the  ceans  of 
the  corresponding  y’s.  For  large  samples  from  a  bivariate  normal  population, 


the  maximut  efficiency  (81  per  cant)  is  attained  Wbaa  k  •  0.27n.  For  snail 
staples  tiie  efficiency  lies  between  70  per  cent  and  80  per  cent  when  k  is 
between  ore-third  and  one-quarter  of  n. 

Philip  G.  Carlson  (1958)  obtains  a  recurrence  foraula  for  ,  the 

expected  value  of  the  range  of  a  sample  of  si'*e  2n+l,  in  terns  of  E(wn+i+1) 
for  i»l,2,**’,  n-1. 

Ferrell  (1953)  suggests  a  method  for  computing  control  limits  for  samples 
from  a  strongly  skewed  universe  wiuch  can  be  approximated  by  a  lognormal  dis¬ 
tribution.  The  geometric  range  **  Max/Min  is  used  in  place  of  the  range  and 
the  geometric  midrange  m^$ax  x  Min  in  place  of  the  mean.  The  author  describes 
corresponding  changes  in  the  confutation  of  limits.  This  method  accepts  the 
skewness  of  the  mi  verse  and  allows  a  search  for  other  assignable  causes  of 
variation. 

Harter  (1958)  discusses  the  use  of  simple  quasi-ranges  in  estimating  the 
standard  deviation  of  normal,  rectangular  and  exponential  populations.  For 
the  normal  population,,  he  tabulates  the  expected  value,  variance  and  standard 
deviation  of  the  r—  quasi- range  for  samples  of  size  n  for  r*0(l)8  and  n»(2r*2) 
(1)100.  For  each  pair  of  values  of  r  and  u,  he  also  tabulates  the  efficiency 
of  the  unbiased  estimator  of  population  standard  deviation  based  on  one  sample 
quasi-range.  He  ( Iso  considers  estimators  based  a  linear  combination  of 
two  quasi -ranges ,  and  give;-  a  method  for  determining  the  weighting  factor 
which  maximizes  the  efficiency.  The  most  efficient  unbiased  estimators  based 
on  one  quasi- range  for  n«2(l)lQ0  and  on  lir^ar  conblrtations  of  two  adjacent 
quasi-friges  and  of  any  two  quasi-ranges  (r<r  <8)  for  n*4(l)100  are  tabulated, 
along  with  their  efficiencies.  These  estimators  are  compared  with  those  cf 


Grdbbs  and  Wstmr  (3947)  based  on  group  rages,  and  their  we  is  illwtrsted 
by  an  exaaplo.  Bor  rectangular  and  exponential  populations,  the  nost  efficient 
uabiased  estimators  based  on  one  quasi-rage  are  tabulated,  together  with  their 
efficiencies  and  the  bias  when  estimators  which  assume  normality  are  used. 

Bernard  Os  tie  and  J.  M.  fiesan  (1958)  express  the  distribution  of  the 
range  of  a  sample  of  size  &  from  a  right  triangular  population  in  terms  of  the 
rage  of  the  population,  and  apply  the  result  to  an  acceptance  sampling  problem. 

Pladoett  (1958)  examines  the  methods  used  by  the  ancient  Babylonian  and 
Greek  astronomers  in  estimating  parameters  of  observational  data,  and  finds  no 
evidence  that  they  made  use  of  the  arithmetic  mean  of  a  group  of  comparable 
observations.  He  does  trace  its  use  as  far  back  as  the  late  sixteenth  century, 
when  Tycho  Brahe  applied  it  to  astronomical  observations  in  order  to  eliminate 
systematic  errors.  The  concept  of  the  mean  as  a  more  precise  value  than  a  single 
measurement  was  already  known  to  de  Moivre,  Flamsteed  and  Maipertius  early  in 
the  eighteenth  century,  but  remained  controversial  until  the  second  half  of 
that  century,  when  it  was  demonstrated  conclusively  by  Simpson  (1756,  1757) 
and  Lagrange  (1774) ,  both  of  whom  made  use  of  results  due  to  de  Moivre.  The 
author  closes  with  an  account  of  the  work  of  Simpson  and  Lagrange,  which  we 
have  already  examined. 

Rider  (1958)  considers  the  family  of  density  ftnctions  f(x)-c(l+|x-e|k)"h, 

— <*<-,  where  the  case  h-1,  k»2  is  the  well-known  Cauchy  density  functio  i,  mid 
ocqrares  the  efficiency  of  tire  sanple  mean  and  the  sample  median  as  estimators 
of  e  for  various  other  values  of  h  and  k. 

Jean  Geffrey  (1959)  makes  notable  contributions  to  the  theory  of  extreme 
values ,  including  the  proof  of  various  results  concerning  stability  in  probability 
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and  almost  caylcte  stability  of  midrange,  q^i-midrioges,  and  rage  of  saples. 

A 

Gabel  (1959)  expresses  the  asjmgrtotic  distribution  of  the  reduce!  m — 
range  iddch  is  a  certain  linear  transform  of  the  range  *B,  in  tens  of 
the  previously  derived  asy^rtotic  distrihctiop  of  Xj,  the  reduced  range,  and 
calculates  its  moments.  For  b  close  to  n/2,  dere  n  is  the  staple  size,  he 
shows  that  the  range  is  asymptotically  noraally  distributed  and  gives  its 
nean  and  standard  deviation. 

Hemann  Hansel  (1959)  suanarizes  the  results  of  investigations  by  Tippett 
(1925) ,  E.  S.  Pearson  (1932) ,  Behrens  (1956)  and  others  on  the  use  of  range 
for  the  estimation  of  measures  of  variability.  lie  points  out  that  use  of  the 
range  makes  possible  short-cut  methods  of  ascertaining  standard  deviation  with 
only  a  slight  loss  of  accuracy  which  are  applicable  in  every  branch  of  biology. 

Harter  (1959)  gives  a  revised  and  condensed  version  of  the  material  in 
Ms  earlier  import  [Harter  (1958)3  on  the  use  of  sample  quasi- ranges  in  esti¬ 
mating  population  standard  deviation.  He  points  out  that  the  standard  deviatu*! 
of  an  exponential  papulation  whose  lower  limit  (location  parameter)  is  known 
can  be  estimated  more  efficiently  from  a  single  order  statistic  than  from  a 
quasi-range. 

Harter  and  Donald  5.  Qemm  (1959)  give  a  description  of  the  computation 
and  use  of  tables  of  the  probability  integral,  percentage  points  and  moments 
of  the  range  for  samples  from  a  normal  distribution.  They  include  the  following 
tables:  (1)  an  eight-decimal-place  table  of  the  probability  integral  of  the 
(standardized)  range,  W"w/i,  at  intervals  of  0.01,  for  samples  of  size  n*2(l)20 
(2)  40  (10)  100;  (2)  a  six-decimal-place  table  of  percentage  points  of  the  range 
for  the  same  values  of  n  and  emulative  probability  P*.0001,  .0005,  .001,  ,005, 
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91,  .825,  .05,  .1  (.1)  .9,  .95,  .975,  .99,  .995,  .999,  .9995, 


AMA. 


(3)  «  td>l»  of  •nmmmtK  of  the  ra*e  [m  to  10  dacfoal  places  fll  significant 
figures),  variance  to  1«P(1QSF),  ^  to  8  DP(8SF)  and  «4  to  7JP(8SF)]  for 
samples  of  site  n»2 (1)100. 

Bernard  Qstle  and  George  P.  Stock  (1959)  prove  that  the  synaetry  of  the 
parent  population  implies  that  the  sample  nean  and  the  saaple  range  are  m- 
cornolated,  and  cons  tract  an  exmpia  to  shoe  that  the  converse  is  not  true. 

They  present  a  necessary  and  sufficient  condition  that  the  correlation  between 
the  nean  and  the  raoge  be  positive  (negative).  They  also  prove  that  the 
sy nee  try  of  the  parent  population  implies  that  the  saqple  rmge  and  nidrange 
axe  tacorrelated. 

K.C.S.  Pillai  and  Benjaain  P.  Tienzo  (1959)  develop,  in  series  fora,  for 

n-3,4,5  and  v*lO,  the  distribution  of  the  standardized  extrcne  deviate  from. 

the  saaple  nean,  max  [(x^-  k)/o,  (X-x^/o]  and  the  corresponding  student! zed 

deviate,  max  [(x^-  X)/sv,  (X  -  Xj)/sv),  where  is  an  ordered 

2 

saaple  of  size  n  froa  a  nontax  population  with  variance  o  ,  x  is  the  smplo 

2 

nean,  and  sy  is  the  square  root  of  an  independent  nean  square  estimate  of  ? 
based  on  v  degrees  of  freedoa.  Pillai  (1959)  tabulates  the  upper  51  and  II 
points  of  tfl  for  n-2(l)  10,12  and  v«l(l)10,  and  discusses  the  aethod  of  pre¬ 
paration  of  this  table. 

Rider  (1959)  derives  the  distribution  of  the  quasi-range,  Wy«  X^-  Xr 
where  xj*x2^*  ’  *<xn  are  drawn  at  randan  froa  an  exponentially  distributed  popu¬ 
lation,  and  gives  the  assent  generating  fraction  md  the  owlaats  of  Froa 

these  he  shows  that  the  aean  of  Vr  slowly  diverges  with  increasing  sarple  size 

2  • 

while  the  variance  ^roaches  a  finite  value:  for  exaaple,  *  /6-1.6449  for  r»0. 
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Join  Eteri  fcalsh  (19S9)  proposes  a  large-sanple  aoppewetric  criterion 
fer  rejection  of  ootlynjg  observations.  Let  '  ‘g^  be  jndrfwndmt  obser¬ 

vations  fios;  coalMPOcs  populations.  The  noli  hypothesis,  is  that  these 
observations  tJl  resulted  from  independent  randan  drawings  fton  the  sane  veil* 
behaved  populaUau  with.  ^specified  shape.  The  alternative  hypothesis  is  h^: 
the  i  seallest  ohserr^Ions  are  too  snail  [or  the  i  largest  are  too  large) 
to  be  consistent  with  Hq,  where  i  is  a  snail  muber  iMch  should  be  specified 
without  knowlecge  of  the  observatians.  The  alternative  is  accepted  if  a 
statistic  of  the  form  (1+A)xi+1+  Air,  is  negative,  where  Art,  k  is  the  largest 
integer  contained  in  and  n  is  sufficiently  large.  Similarly,  the  al¬ 

ternative  H*  is  accepted  if  ^^--(l+A;^  ^  ^Sn-l-k  ^  P05^11®*  TWo-sided 
tests  are  obtained  by  coobining  these  one-sided  tests.  Tchebycheff's  inequality 
yields  an  approximate  tpper  bound  for  the  significance  level  of  the  test  for  A 
suitably  chosen. 

K.  Weiler  (1959)  shows  that  if  a>0  is  the  smallest  and  b  is  the  largest 

of  n  values  whose  arithmetic  and  hamonic  means  axe  X  and  H,  respectively,  then 
2 

0  <  (x-H)  /H<  (b  -  a)  /4ab ,  the  first  equality  holding  only  if  all  n  values  are  equal 
and  the  second  only  if  half  of  them  have  the  value  a  and  the  other  half  the 
value  b.  Moreover,  since  H cg<2,  where  g  is  the  geometric  mean,  the  same  inequality 
holds  for  (X-g)/g.  Thus  X  differs  little  from  H  or  g  if  the  n  values  have  a 
small  range  end  all  are  far  removed  from  zero. 

Prank  J.  Ans combe  (1960)  examines  nuaerous  criteria  for  the  rejection  of 
outliers  proposed  during  a  period  of  more  than  a  century.  He  suggests  that  re¬ 
jection  redes  should  not  be  regarded  as  significance  tests ,  as  has  usually  been 
the  case,  but  as  insurance  policies.  He  makes  a  detailed  study  of  the  effect 
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of  routine  fpliatiai  of  njectus  criteria  to  npliote  (cyociilly  tripli¬ 
cate  ad  qadnfliote)  detantiaatians  of  a  single  mlse,  foaming  attention 
■Body  oa  tales  appropriate  risen  the  population  standard  deviation  o  is  knows, 
hut  giving  sane  attention  to  stodenttssd  ides,  lie  games  rite  fct  faring 
rales:  fUle  0.  for  given  C,  reject  every  sbsemtix  7.  (i«l,2,’**t  a,  rime 
n  is  the  Maher  of  observations}  sod*  that  }2^<>C;,rime  y.-  f. 

Estimate  die  neat  *  by  the  neat  of  the  retained  observations*  Sale  1.  For 
given  C,  reject  yM  if  ji^i>Ggr,  rime  M  is  the  value  such  that  jajjMtji  for 
all  i#l;  otherwise  no  rejections.  Estinate  s  by  the  nets  of  the  retained 
observations,  thus  *  ■  f  if  J i^J <Ca ,y-y- z^V (£-1)  if  >&.  Rule  2.  ^pply 
Rule  1.  If  ai  observation  is  rejected,  consider  the  Training  observations  as 
a  sanple  of  size  n-1  acd  apply  Role  1  again;  and  so  an.  Estinate  u  by  the  nean 
of  the  retained  observations,  The  author  rinds  Rule  9  msatis  factory,  since  a 
single  outlier,  if  it  outlies  sufficiently,  can  cause  the  entire  saaple  to  be 
rejected.  He  rinds  Rule  1  satisfactory  for  saall  sables  (r  *7,  or  4) ,  but 
since  Rule  1  cm  reject  only  oou  outlier.  Rule  2  oust  be  considered  for  larger 
samples  which  nay  contain  more  than  one  outlier. 

Dixon  (1960)  considers  various  estimators  of  p.  puladlon  nean  ®d  standard 
deviatica  fron  censored  noxnal  sables.  Amo ng  the  estimators  of  the  near, 
considered  are  the  Winsorized  weans,  in  which  the  magnitude  of  an  extxene  obser- 
vatirjn  which  is  unknown  or  poorly  known  (or  suspected  of  being  spurious)  is 
replaced  by  the  next  largest  (or  saallest)  observation,  as  proposed  by  Charles 
P.  Winscr,  instead  of  rejecting  it  entirely.  lixon  finds  that  the  efficiency 
of  Kinsorised  weans,  when  balance  is  maintained  by  Winsoriring  the  sane  muter 
of  obser/vtions  at  each  extrene,  is  rmazfcably  high  relative  to  that  of  the 


best  linear  systematic  statistic. 

Baxter  (1960)  gives  a  condensed  version  of  the  aaterial  on  the  mge  of 
samples  from  a  normal  population  contained  in  the  report  fay  Harter  %  dean 
(1959).  The  table  of  the  probability  integral  is  omitted,  but  those  of  the 
percentage  points  (abridged)  and  moments  of  the  range  are  included,  along  with 
a  section  on  interpolation  in  the  tables  lduch  is  not  fond  in  the  report. 

Robert  Vincent  Hogg  (1960a)  defines  c£d  location  statistics  T  aid  even 
location- free  statistics  S  by  TCx^  h,  xf  h,"',  x^*  h)  ««•  TCXj^,*",^) 

♦  h,  K-^,-  *2»"  -TCx^,***,^),  SOtj*  h»  *2+  h,*",  y  &)  *  S 

(Xp  x^.**',^),  -y*  S(x1,  X2,"*^g  for  all  real  values  of 

h.  He  proves  that  foe  symmetry  of  a  probability  density  function  implies  that 
foe  correlation  between  an  odd  location  statistic  and  an  even  location- free 
statistic  is  zero.  This  generalizes  two  special  results  of  Qstle  £  Steck  (1959) 
The  sample  re  an,  the  smple  median,  and  the  sample  addrange  axe  odd  location 
statistics,  while  the  saqsle  variance,  foe  sample  range,  the  sample  quasi-ranges 
foe  sample  mean  deviation  fron  the  saaple  median,  and  any  ratio  of  two  of  these 
statistics  are  even  location- free  statistics.  Hogg  (1960b)  proves  that  if  foe 
distribution  is  symmetric  about  8  and  El  exists,  then  E{T|S  *  sf*e,  together 
with  a  multivariate  extension  useful  in  obtaining  unbiased  estimators  of  e:  e. 
g. ,  (R^+  ,  where  Mj  and  are  the  medians  and  and  foe 

ranges  of  random  samples  from  two  distributions  both  of  which  are  symmetric 
about  e. 

William  H.  Kruskal  (1960)  gives  an  exposition  of  the  problea  of  handling 
wild  observations ,  or  outliers .  He  suggests  that  such  observations  should  be 
reported  even  though  they  may  be  excluded  from  the  analysis;  moreover,  they 


should  not  be  discussed  simply  i *  terns  o£  tie  propriety  of  including  tin 
ia  the  analysis,  of  idddh  the  aattVTr  gives  illustrations,  bat  treated  as 
opportunities  to  learn  something  new.  Be  Classifies  outliers  into  three  cate' 


gories  according  as  there  is  (a)  a  priori  knowledge,  (b)  a  posteriori  knowledge, 
or  (c)  no  knowledge  of  a  variant  causal  pattern.  Those  in  the  third  category 
are  the  ones  which  cause  the  trodile,  and  the  author  expresses  dissatisfaction 
with  existing  approaches  to  handling  then. 

Rider  (1960a)  engages  exact  variances  with  the  Takes  obtained  by  using 
the  fbnmla  for  asynptotic  variance  for  the  nedians  of  snail  sables  (n  *  1,3, 
5,7)  frea  exponential,  normal ,  cosine,  parabolic,  rectangular  and  inverted 
parabolic  populations,  which  have  standard  fourth  moment  9,3,  2.19,  2,14, 

1.8  and  1.61  respectively.  He  finds  that  the  adequacy  of  the  asymptotic  ferula 
increases  with  «4.  Rider  (1960b)  nates  a  similar  comparison  for  the  variance 
of  the  nedian  of  sanples  of  size  2k  ♦  1,  for  1H3(1)15,  fran  a  Gaudy  distri¬ 
bution. 

Tukey  (1960)  surveys  sampling  fron  contaminated  distributions  and  readies 
a  maker  of  conclusions,  of  which  the  following  are  relevant  to  the  present 
studbr:  (1)  'In  large  sauples  the  saqple  nean  is  not  nearly  so  safe  an  indicator 
of  location  as  is  the  nean  of  the  observations  which  remain  after  a  snail 
percentage  of  the  highest,  and  an  equrd  percentage  of  the  lowest,  have  been 
set  aside  (use  of  a  lightly  truncated  nean)."  (2)"In  slightly  large  samples, 
there  is  grand  for  doubt  that  the  use  of  the  variance  (or  the  standard  devia¬ 
tion)  as  a  basis  for  estimates  of  scaling  type  is  ever  truly  safe.”  (3) "In 
moderately  or  very  large  samples ,  *  *  *  the  variance  or  standard  deviation  is 
safely  used  only  (for  certain  purposes  which  the  author  specifies]."  (4)  "Nearly 


esti nates  of  scale  and  location  entirely  useless.”  (5)  "If  cxetadMtiai  is 
a  real  possibility  (and  jrfma  is  it  not?) ,  neither  tnan  nor  mine?  is  likely 
to  be  a  wisely  chosen  basis  for  anting  estinates  fim  a  large  saaple.”  (6)  "As 
a  interia  neasme,  the  use  of  truncated  variances  is  likely  to  be  quite 
satisfactory.”  (7)  "In  s— Tier  staples,  the  use  of  the  neaa  deviation  nay  be 
a  frequently  useful  anywise". 

Anscoebe  (1961)  considers  four  statistics  designed  to  reveal  certain  types 
of  departure  fran  the  ideal  statistical  conditions  (independent  and  nomally 
distributed  residuals  with  zero  nean  and  constant  variance)  aider  which  the 
least-squares  nethod  of  esti eating  the  pawetsrs  in  a  regression  equation  is 
unquestionably  satisfactory.  He  gives  infometion  about  the  distributions  of 
these  statistics  tnder  the  null  hypothesis  of  ideal  conditions,  but  states 
that  a  thorough  invest*  gstia;  of  the  appropriateness  of  the  least-squares 
aethod  would  have  to  go  furthe  r,  and  would  encotntex  grave  difficulties.  He 
states  that  for  mst  fields  of  observation,  outliers  ny  be  expected  to  occur, 
so  that  significance  tests  to  aetenrine  whether  extrene  observations  do  in 
fact  occur  with  frequency  iuccnpatible  with  the  ideal  conditions  nay  be 
irrelevant.  He  writes:  "The  day-to-day  problen  with  outliers  ***  is  not:  is 
the  ordinary  least-squares  nexhod  appropriate?  but:  hem  sa  >jld  [it]  be  nodified? 
not:  do  gross  errors  occur  soaetiaes?  but:  how  can  we  protect  ourselves  fron 
the  gross  errors  that  no  doubt  occasionally  occur?  The  type  of  insurance 
usually  adopted  (it  is  not  the  only  kind  conceivable)  is  to  reject  couple tely 
any  observation  whose  residual  exceeds  a  tolerance  calculated  according  to 
sone  rule,  and  then  apply  the  least-squares  aethod  to  the  remaining  observation?." 
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Be  refers  to  Ids  cm  earlier  pfer  [teoatc  (I960)]  oortiiiitt  suggestions 
far  chomiag  a  routine  rejection  rule  and  to  tie  Byrtm  approach  of  de 
Rnetti  (1961). 

Eiseehert  (1961,1962)  sa—riaes  tie  work  of  tocavicfa  am  tie  arttMtine 
of  observations.  He  points  out  tint  Bascovich  was  tie  first  to  devise  a  con- 
pletely  objective  procedure  for  teupefy  detemuring  tie  coefficients  of  a 
two-parameter  line  j  m  a  *  tx  first  a  set  of  three  or  wore  observational  points. 
He  also  notes  that  Boscovich's  procedure,  lib  tie  median,  is  comparatively 
insensitive  to  die  ■ ore  extaa  of  a  set  of  observations,  and  is  especially 
well  suited  to  surmrizing  tie  linear  trend  evidenced  by  a  a ore  or  less  hetero¬ 
geneous  set  of  data  coapiled  fraa  various  sources,  or  obtained  by  a  neasure- 
nent  procedure  that  has  a  tendency  to  yield  occasional  discordant  values. 
Besides  Bosoovich's  geometric  algorithm,  die  author  discusses  die  algebraic 
formulation  of  Boscovich's  method  by  Laplace  and  tie  modification  by  Edgeworth, 
who  advocated  unrestricted  minimization  of  tie  sue  of  absolute  values  of  tie 
residuals,  dropping  die  restriction  that  their  algebraic  sua  must  be  zero, 
tuns  in  effect  requiring  that  tie  line  pass  through  die  double  median  point 
(?,?)  instead  of  die  center  of  gravity  (x,f)  of  die  observations.  He  also 
mentions  die  more  recent  work  of  ftiodes,  Singleton,  Harris,  aid  Bejar,  as  well 
as  die  classical  work  on  rival  methods,  including  least  squares. 

Thomas  S.  Ferguses  (1961a)  derives  locally  best  tests,  based  on  the 
saple  skewness  a3  »  and  the  sa^ile  kurtosis  a4  respectively,  of  die  null 

hypothesis  Hq  that  a  mmber  of  observations  were  all  drawn  at  random  from  the 

2 

same  normal  population  N(y  ,o  )  against  the  alternatives  that  one  or  more 

2 

outliers  came  from  N(p+Xo,o  )  and  Hg  that  one  or  more  outliers  cmw  i 
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the  power  of  these  tests  with  those  proposed  by  Grtfcbs 


(1950)  ad  DLncn  (1950).  Fezgum  (1961b)  surveys  the  literature  on  the  rejec¬ 
tion  of  outliers  fron  the  tine  of  Peirce  (1852)  to  date,  with  special  ewphasis 
an  the  period  after  1950.  He  devotes  one  section  to  the  relation  between 
rejection  mid  estimation,  in  which  he  discusses  triwring  aid  Minsorization. 

Bnno  de  Knetti  (1961)  proposes  a.  Bayesim  approach  to  the  treatment  of 
outlying  observations  in  which  observations  are  never  rejected,  though  the 
influence  of  outlying  observations  on  the  final  distribution  may  be  weak  or 
almost  negligible.  He  distinguishes  three  cases  ,in  which  the  errors  are  (a) 
independent,  (b)  exchangeable ,  or  (c)  partially  exchangeable ,  where  indepcndenc 


means  “independence  with  known  error  distribution", 
"independence  with  unknown  error  distribution",  and 


translates 


translates  "independence  with  an  wknown  conditional  error  related  to  visible 
features  of  the  individual  observations." 

Harter  (1961)  gives  examples  of  the  use  of  tables  of  percentage  points 
of  the  range  [Harter  §  Clean  (1959)  and  Harter  (I960)],  including  an  application 
to  rejection  of  outliers  based  on  use  of  the  test  statistic  Hi  »  w/o  ('he  stand¬ 
ardized  range)  as  proposed  by  Dixon  (1950) . 

M.  G.  Kendal  l  (1961)  reports  the  results  of  a  historical  study  of  the 
work  of  Daniel  Bernoulli  on  the  method  of  maximum  likelihood.  He  sets  the 
stage  by  reviewing  the  earlier  contributions  of  Cotes  (1722) ,  Euler  (1749) , 

Mayer  (175C) ,  [Maire  §]  Boscovich  (1755) ,  Simpson  (1756 ,1757) ,  Lagrange  (1774) , 
and  Laplace  (1774).  Kendall's  ccuments  are  followed  by  English  translations  of 
the  paper  by  Bernoulli  (1778)  and  the  related  or.e  by  Euler  (i778) ,  which 
Kendall  regards  as  less  valuable, 

C.  P.  Quesenberry  and  H.  A.  David  (1961)  point  out  that  one  may  approach 
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the  problem  of  testing  for  outliers  differently  depending  cn  the  object  in 
view.  If  the  primary  interest  is  in  panning  the  observations  so  as  to  secure 
a  note  accurate  analysis  of  what  is  left  (e.g.  to  obtain  the  nost  reliable 
estinate  of  a  seen)  the  criterion  say  b u  tike  effect  on  the  standard  error  of 
estimate,  whereas  if  the  interest  lies  in  identifying  the  exceptional  observa- 

I 

tions  so  as  to  create  a  new  insight  into  the  phenomena  aider  study,  the 
criterion  nay  be  the  risk  of  wrongly  deciding  whether  an  observation,  is  excep¬ 
tional  or  not.  The  authors  take  the  second  point  of  view.  They  modify  the 
test  statistics  proposed  by  Nair  (1948)  md  by  Hrdperin  et  al  (.1955)  for  one¬ 
sided  and  two-sided  tests,  respectively,  by  replacing  the  independent  estinate 
s  of  population  variance  in  the  denominator  by  the  pooled  estinate  s*  * 
f  [(n-l)s2+  vs2]/(n+v-l)}^2,  which  makes  use  also  of  the  interns!  estimate  s2 
from  the  sample  of  size  n.  They  compute  and  tabulate  percentage  points  of  the 
modified  statistics. 

Pranab  Kuaar  Sen  (1961a)  studies  sane  properties  of  the  asymptotic  variances 
of  the  saqple  quantiles  and  quasi-midraiges  and  discusses  the  role  of  the  sample 
median.  He  shows  that,  among  the  class  of  sample  quantiles,  the  sample  median 
has  asymptotically  the  smallest  variance  only  under  somewhat  restrictive  < 

regularity  conditions;  while  among  the  class  of  sample  quasi-midranges ,  the 
smple  median  has  asymptotically  the  smallest  variance  only  for  a  class  of 
nm- regular  parent  density  functions.  He  tabulates  the  relative  efficiency  of 
the  sample  medial  with  respect  to  the  optima  quasi -midrange  for  nine  common 
parent  distributions.  Sen  (1961b)  studies  the  stochastic  convergence  of  the 
staple  extreme  values  for  distributions  hairing  a  finite  end-point  and  the 
asymptotic  convergence  of  their  moments  to  the  corresponding  ernes  of  their 
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Halting  distribution;.  He  applies  the  results  to  the  estimation  of  the  popu¬ 
lation  midrange  from  the  saag>le  midrange. 

K.  S.  Srikantan  (1961)  treats  the  general  problem  cf  testing  a  regression 
nodal  against  the  alternative  hypothesis  of  a  single  outlier.  He  develops  test 
criteria  which  are  generalizations  of  those  of  Grubbs  (1950)  and  of  Pearson  § 
Chandra  Sekar  (1936) ,  the  latter  based  on  the  work  of  Thompson  (1935) ,  and 
tabulates  their  St  and  It  critical  values  for  regression  on  m  variables  (v>l,2,3). 

J.  Tiago  de  Oliveira  (1961)  gives  a  general  proof  of  the  asymptotic  inde¬ 
pendence  of  the  sample  mean  and  extremes  for  an  absolutely  continuous  distribu¬ 
tion  satisfying  the  conditions  of  Gusbel  (1946)  for  asymptotic  independence  of 
the  extremes. 

Simeon  M.  Berman  (1962)  shows,  uicter  general  conditions,  that  if  the 
standardized  largest  observation  has  a  limiting  distribution,  then  the  student - 
ized  largest  observation  has  the  same  limiting  distribution  and  the  student! zed 
largest  absolute  deviate  has  a  limiting  distribution  f  the  same  form. 

Giovanni  Cancelliere  (1962)  gives  a  new  proof  of  the  theorem  that  the  sun 
of  the  absolute  values  of  the  deviations  of  a  set  of  observations  ^ ,  x^ 
from  a  number  x  is  a  minimum  when  x  is  the  median  of  the  x^(i*l,*  **  ,n). 

Odoardo  Cucconi  (1962)  proposes  a  criterion  for  the  rejection  of  outlying 
observations  from  a  k- dimensional  distribution  (k-1,2,3,***)  which  is  assumed 
to  be  k- variate  normal,  but  possibly  contaminated  by  spurious  observatic/is.  This 
criterion  is  a  generalization  of  that  of  Ihoopsan  (1935) ,  to  which  it  reduces 
for  k»l.  The  criterion  is  [N/(N-l)]^t,1^ltl(A^/A) (^-mpCgXj-m^),  s«l,2,**‘, 

N,  where  rx^(i«l,2,°**,N;  i«l,2,***,k)  is  the  i—  coordinate  of  the  r^  observation 
A  is  the  value  of  the  determinant  of  the  matrix  whose  element  (c^«  c^)  is  given 


br  ft.  «■  of  die  products  of  tte  dniatins  of  the  ^  and  ^  fro  their 
respective  veins  m^  and  n. ,  and  a.^  is  the  cofactor  of  c^.  In  order  to  test 
the  hypothesis  Hg  that  the  s—  (s»l,2,**‘  ,N)  observation  is  homogeneous  with 
the  others,  the  value  of  the  criterion  is  calculated  and  compared  with  the 
critical  value  r^.  The  values  of  r^,  obtained  from  the  incomplete  Beta 

distribution,  are  tabulated  to  two  decimal  places  for  <>*0.05,0.01;  k*l,2,3,4; 
and  various  values  of  N  tanging  from  5  to  100. 

Harter  (1962) ,  v,  part  of  a  study  of  the  ratio  of  two  ranges  not  othexwise 

relevant  to  the  present  topic,  tabulates  (to  8EP)  the  probability  density  fine- 

2 

tion  of  the  standardized  range  W-w/o  for  samples  of  size  n«2(l)16  from  N(u,l  ), 
at  intervals  of  0.01  in  W.  This  table  represents  a  considerable  improvement 
over  the  earlier  4DP  table  of  Sibuya  8  Toda  (1957)  at  intervals  of  0.05  in  W. 
Harter’s  values  for  W  a  multiple  of  0.05,  whan  rounded  to  4EP,  agree  with  those 
of  Sibuya  8  Toda  except  for  an  occasional  discrepancy  of  one  unit  in  the  last 
place. 

Bruce  Marvin  Hill  (1962)  proposes  a  test  of  linearity  versus  convexity  of 
a  median  regression  curw.  Specifically,  he  proposes  to  test  HQ:  Y^«  a+6 
against  H^:  Y^»4(xp+e^,  i-0,l,’**,n,  where  a,B  and  ♦  are  unspecified  and  <|>(x) 
is  a  nonlinear  convex  fraction,  the  are  independent  indentically  distributed 
random  variables  with  median  zero  and  a  continuous  density  function  f(e)  such 
that  f(0)>0,  and  the  are  fixed  and  known.  The  test  involves  estimating  a 
line  from  a  central  subset  of  the  observations  by  the  procedure  (using  medians) 
of  Brown  8  Mood  (1951) .  making  a  weighted  corat  of  the  number  of  remaining 
observations  lying  above  the  line,  and  rejecting  Hg  if  this  number,  P^,  is  too 
large.  The  author  gives  the  asymptotic  distribution  of  under  Kg,  from  which 
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he  obtains  critical  values  of  1^,  and  the  asymptotic  distribution  of  Ea  inder 
Hp  free  which  he  obtains  the  power  of  the  test.  The  test  can  be  adapted  to 
two-sided  alternatives. 

Alex  Rosengard  (1962)  seeks  to  vilify  the  existing  theory  of  limiting  distri¬ 
butions  of  the  mean  and  of  the  extremes  of  a  sanple  by  studying  the  limiting 
joint  distribution  of  these  three  statistics. 

Sibuya  (1962)  examines  an  asymptotic  formula  for  the  expected  value  of  the 
median  of  the  ranges  of  N  independent  normal  samples  each  of  size  n.  He  compares 
the  approximate  values  obtained  from  this  formula  with  exact  values  obtained  by 
nunerical  integration  for  n«2,  N-3(2)17. 

M.  M.  Siddiqui  (1962)  makes  a  ntnerica1  study  of  the  method  proposed  by 
Giu  §  Hotelling  (1955)  for  approximating  the  moments  of  the  sample  median  by 
use  of  a  Taylor  series  expansion  of  the  inverse  of  the  emulative  distribution 
function.  He  applies  this  method  to  various  distributions  and  presents  the 
results  in  tabular  form.  They  show  that  the  relative  error  decreases  monotoni- 
cally  with  sample  size,  and  generally  support  the  author's  expectations  that 
properties  of  the  parent  papulation  which  contribute  to  rapidity  of  convergence 
are  finite  range,  a  lew  value  of  kurtosis,  and  symmetry. 

Tukey  (1962) ,  in  a  study  of  the  future  of  data  analysis ,  devotes  considerable 
attention  to  "spotty  data"  resulting  from  long-tailed  fluctuation- and- error  dis¬ 
tributions  ,  occasional  causes  with  large  effects ,  or  irregularly  non-constant 
variability.  He  offers  a  number  of  possible  cures  or  palliatives,  including 
trimming  and  Winsorizing  samples ,  which  result  in  a  small  loss  in  efficiency  when 
the  samples  come  from  a  normal  distribution  but  a  large  gain  in  efficiency  when 
they  come  from  a  very  long- tailed  one  (e.g.  the  Cauchy  distribution).  For 
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two-dimensional  arrays  he  proposes  graphical  Methods  to  be  applied  to  the 
residuals,  including  (1)  a  conventional  plot  on  nonaal  probability  paper,  (2) 
a  Modified  plot  of  z^«  (y^-  y)/a^|Q  against  i,  where  the  y^(i»l,2,‘**,n)  are  the 
residuals,  y  is  their  Median,  and  tjjn  is  tike  standard  nonaal  deviate  corres¬ 
ponding  to  the  emulative  probability  of  the  i—  order  statistic  of  a  saaple 
size  n,  and  (3}  an  arithmetic  analogue  of  the  Modified  plot  called  FUNOP 
(fim  HOI  NOmal  Plot) .  He  proposes  a  specific  procedure  called  FINOR-HNCM 
(FULL  NOmal  Rejection-FUll  NOmal  Modification)  because  it  uses  HJNOP  and 
first  rejects  and  then  Modifies  deviations.  This  procedure  is  a  sort  of 
two-diMensional  analogue  of  triming  and  Winsorization,  since  it  first  rejects 
CtriK)  the  aost  extr*.  deviations  (those  greater  tkm  A,,-  o)  and  then  reduces 
to  a^[n*  o  the  remaining  deviations  exceeding  the  latter  value,  both 
and  being  prechosen. 

Ansconbe  and  Tukey  (1963)  emphasize  the  importance  of  examination  and 
analysis  of  residuals,  which  may  furnish  information  about  the  presence  of 
outliers  and/or  about  inappropri  atenes s  of  the  fitted  curve  or  the  scale  of 
measurement.  They  suggest  use  of  both  graphical  and  analytic  techniques ,  but 
suggest  beginning  with  the  former,  preferably  in  the  fora  of  a  scatter  diagram 
in  which  residuals  are  plotted  against  fitted  values.  When  the  most  prominent 
sort  of  misbehavior  of  the  data  has  been  diagnosed,  it  is  important,  they  say, 
to  deal  with  it  before  seeking  out  other  sorts  of  misbehavior.  If  outliers 
are  detected  they  may  be  rejected  outright  or  modified  by  Winsorization  or  by 
assigning  them  smaller  weights  which  decrease  smoothly  as  the  size  of  the 
residual  increases.  If  the  signs  of  the  residuals  show  a  definite  pattern,  this 
may  indicate  that  the  type  of  curve  fitted  is  inappropriate .  If  the  spread  of 
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the  residuals  is  correlated  with  the  fitted  tr, tines,  this  mtf  tW 

the  scale  of  measurement  is  inappropriate.  Those  two  phenomena  are,  of  course, 
not  independent;  e.g.  a  straight  line  may  adequately  fit  the  data  resulting 
from  a  transfbmation  of  scale.,  even  though  there  ms  evidence  of  curvilinear- 
ity  an  the  original  scale  of  aeasureaent. 

Eisenhart,  Daring  #  Martin  (1963)  give  tdbles  to  accompany  their  earlier 
Abstracts  (Eisenhart,  Daring  §  Martin  (1948a,b))  concerning  the  distributians 
of  the  nedian  and  the  nean  of  sables  fran  various  populations.  The  abstracts 
are  reprinted  (slightly  edited). 

Harter  (1963)  tabulates  (to  8IP)  the  probability  integral  of  the  standard- 

ized  r=- quasi-range  Wr*  v^/o,  at  intervals  of  0.01  in  for  x*0(l)8  and 

samples  of  size  n«  (2r*2)  (1) 20(2)40 (10) 100  from  a  normal  population  with  unit 
2 

variance,  N(ii,1  ) ,  obtained  by  numerical  integration.  He  also  tabulates  (to 
6  IP)  percentage  points  of  l*r  corresponding  to  cumulative  probability  P*0.000i, 
0.0005,  0.001,  0.005,  0.01,  0.025,  0.05,  0.1(0.1)0.9,  0.95,  0.975,  0.99,  0.995, 
0.999,  0.9995  ,  0.9999  for  the  sane  values  cf  r  and  n, obtained  by  inverse 
interpolation  in  the  table  of  the  probability  integral. 

Joseph  L.  Hodges,  Jr.  and  Erich  L.  Lehaumn  (1963)  propose  various  estimates 
of  location  based  on  rank  tests.  Of  particular  interest  is  the  estimate  t,  which 
is  tbs  median  of  the  N(H#-l)/2  averages  (z^+  z.)/2  of  the  and  order 
statistics  (i*j)  of  a  saaple  of  size  N.  The  authors  show  that  this  estimate, 
which  we  shall  call  the  Hodges -Lehmann  estimate, has,  under  specified  conditions, 
certain  properties  of  regularity,  invariance,  symmetry,  median  unbiasedness  and 
asymptotic  normality.  For  sanples  fran  a  normal  population,  it  has  asymptotic 
efficiency  3/i».955  relative  to  the  sample  mean. 
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(Xj-  Q/s  [Bofsoi  (1935)],  ytvrr  X-  is  a  im lea  observation  fin  a  sag£e 
with  mem  X  ad  s  is  an  independent  root  aeai  spne  estimate  of  the  popu¬ 
lation  nriwce,  in  case  the  distnintioD  of  toe  underlying  population  is 
exponential.  He  discusses  its  use  in  obtaining 


estimates  of  factions  of  the  parameters  and  the  probability  distributions  of 
the  reduced  i^  ntder  statistic  and  the  reduced  range  and  in  tests  for  exponen- 
tiality  or  Hie  presence  of  outliers. 

Gerald  J.  liebernan  and  Rupert  C.  Miller,  Jr.  (19630  in  a  study  of  samul- 
taneous  tolerance  intervals  in  regression,  include  a  section  on  detection  and 
correction  of  outliers  using  simultaneous  confidence  principles. 

B.  S.  Niven  (1963) ,  recalling  that  Hie  sample  mean  vd  Hie  sanple  range 
are  mcorrelated  if  Hie  parent  population  is  syumetric  [Us tie  §  Steck  (1959) 
and  Hogg  (i960)]  and  independent  if  it  is  normal  [Lord  (1947) --see  also  Daly 
(1946)]  .gives  a  Method  suitable  for  the  calculation  of  Hieir  joint  distribution 
when  the  saqile  size  is  small.  She  gives  specific  results  for  samples  of  sizes 
throe  and  four  fix*  rectangular  and  exponential  populations ,  and  recalls  [McKay 
$  Pearson  (1933)]  that  for  samples  of  size  3  frx*  a  normal  population,  the 
distribution  of  the  range  may  be  written  in  terms  of  Hie  normal  probability 
integral. 

John  W.  Tukey  aid  Donald  H.  McLaughlin  (1065)  discuss  trimming  and  Winsor- 

ization.  Given  n  ordered  observations  Yi<Ytf  **  *^yn»  arithmetic  mean  is 

y  *  (>'i+  Yi  *  **’  +  yn)/n»  their  g- times  (symmetrically)  trimmed  mean  is  y-  * 

© 

C/c+1  ♦  yg+2+  *"‘  +  yn-g)/Cn'2g)»  and  their  g-times  (syanotrically) 

Kinsorized  mean  is  yWg»(g-yg+1+  yg+1+  yg+2+*"+  yn.g+  g*yn.g)/n-  Clearly  both 
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yT^  oiy^pqr  less  attention  to  extra*  seines  tbat  does  f*  l«t  does 
not  divert  attention  from  the  tails  of  the  saple  so  completely  as  does  yj^. 
For  laderlyiaf  distributions  dose  shapes  exe  very  close  to  Gaussian,  the 


lfinsorized 


ere  less  variable  titan  the  rrh»;d 


lying  distribution  is 


,  die  efficiency  of  the 


the  under- 


is  quite 


high*  the  fractional  loss  being  crudely  2g/3c  (corresponding  to  efficiency  of 
about  2/3  for  toe  nedian) ,  but  that  of  the  corresponding  lfinsorized  neans  is 
arh  higher.  On  the  other  hand,  friffl  neans  are  clearly  nudt  note  efficient 


than  lfinsorized 


for  samples  fran  very  long-tailed  distributions.  The 


authors  raise,  but  do  not  answer,  the  question  as  to  tdtere  the  transition  takes 

place.  They  define  g-tiaes  (symmetrically)  frimrii  and  lfinsorized  sms  of 

2  2 

squared  deviations  in  an  anc'ogous  Banner:  SSUj,g»  (y  --  y^)  -*(yg+2-  yTg^ 

‘"^n-g-  VZ;  ^  V2*6'**!'  V2*V2‘V2*"'*frn-g-  V2 

♦  t&B-t  V2- 

lfilxs  (1963)  deals  with  the  problem  of  identifying  and  testing  a  candidate 
set  of  a  snail  naber  t  of  extreme  sample  elements  as  significant  outliers  in 
a  sample  of  size  n  from  a  k-riinmsicnal  normal  distribution  with  mknown 
parameters.  He  considers  the  problem  in  detail  for  t*l,2,3,4.  He  defines 
criteria  r^  aid  r2,  respectively,  for  testing  a  single  observation  as  a  signi¬ 
ficant  outlier  and  a  pair  of  observations  as  significant  outliers,  snail  values 
of  r^  and  r2  constituting  the  critical  regions.  Exact  probabilities  P(r^<r)  and 
P(r2<r)  are  extremely  complicated,  but  the  author  gives  values  of  rQ  for  which 
the  upper  bouids  of  P(r.<r  )  and  P(r,<rJ  have  the  value  a  for  o»0.010,  0.025, 
0.050  ,  0.100;  VI, 2, 3,4, 5;  and  n-5 (1)30 (5)  100 (100) 500. 

Victor  Chew  (1964)  discusses  statistical  criteria  for  the  rejection  of 
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i 


suspected  to  cartihi  gross  errors,  owwiag  the 


lapflatia  o r  several  pcgailaticr*,  univariate  or  reiltiTaiiate  data. 


correlated  or 


obscrvatiams.  He  points  cot  the  weaknesses  in 


afpropnate.  He  also  tabulates  critical  values  for 


criteria.  In  the  case  of  a  staple  of  independent  observations  from  a 


single  normal  distribution  withun'nown 


variance,  be 


•s  or  Grubbs’  criterion  and  eroding  Ghanvenet’s.  If  m  independent  esti- 


aete  of  population  variance  is  available,  he 


the  criterion  of  Hair 


or  that  of  Queseobeny  (  David.  For  a  random  sample  fron  a  bivariate  noreal 
distribution,  be  proposes  a  procedure  based  on  the  radial  distance  if 


the  paraaeters  are  know;  otherwise,  be  recommends  Milks'  criterion.  He  points 


out  that  residuals  fron  a  regression  analysis  are  not  only  cs'velated  but  [often] 
also  have  heterogeneous  variances,  though  he  admits  that,  not  much  work  has  been 
done  in  this  area,  he  rerr—rml:  trying  the  methods  of  Lieberean  and  Miller 


and  of  Srikantaa,  remarking  that  the  latter  nay  be  more  convenient.  For  g 


samples  of  n  independent  observations  each,  the  method  of  Dixon  or  Grribbs  can 


be  applied  to  the  deviations  from  the  saaple  means ,  though  the  control  chart 
approach  nay  be  more  convenient  for  routine  applications ,  especially  if  samples 


are  obtained  sequentially. 


Cyrus  Dereaa  (1964)  shows  that,  for  the  truncated  Cauchy  distribution  with 


.2,  *  -1. 


p.d.f.  g2(x)  *  1/2 (1+x  )  tan  z  for  - z<x<z  and  0  otherwise,  the  variance  of 


the  mean  of  a  saaple  of  size  n  is  (z-tan_1z)/n  tan'1z,  while  the  asymptotic 


-1_.2, 


variance  of  the  sanple  Indian  is  (tan  z)  /n.  Hence  the  efficiency  of  the  saaple 
mean  relative  to  the  sanple  median  is  (tan"*c)3/(z-tan~*z) ,  which  exceeds  unity 


•  v*.  J.  WMM  • 


if  mad  omly  if  z<3.41,  representing  a  tnmcation  of  me  tia  91. 

£i Tewhirt  (1964),  in  a  fisossioi  of  tie  nadag  of  "tetf  in  least 
squares,  points  out  that  tie  method  of  least  squares  was  developed  originally 
free  three  distinct  points  of  view  which  differ  not  only  in  their  ains  and  in 
their  initial  assumptions,  but  also  in  the  meanings  that  they  attach  to  the 
nuaerical  results  comb  to  all  three.  These  viewpoints  are:  (1)  least  Sue 
of  Staaared  jcgjdanjs  [Legendre  (1995)];  (2)  Maxinm  Probability  of  Zero  Error 
of  Estiaaticu  [Gauss  (1909)];  aod  (3)  least  Mean  Squared  Error  of  Estimation 
[Gauss  (1823)].  He  closes  with  the  following  remarks:  ”Tbe  robust  survival 
of  the  Method  of  Least  Squares  as  a  valuable  tool  of  applied  science  no  dod>t 
stems  in  part  fron  the  algebraic  and  arithmetical  advantage  of  Least  Sub  of 
Squared  tesidals  and  in  part  free  the  fact  this  procedure  also  yields  estimates 
of  Least  Mear  Squared  Error  in  the  important  case  whan  the  end  results  are 
linear  functions  of  the  basic  observations.  This  one-to-one  correspondence 
between  minimi  zing  same  function  of  the  residuals  and  adniwiring  the  same 
taction  of  Errors  of  Estimation  appears  to  be  a  unique  property  of  Least 
Squares.  And  although  the  Method  of  Least  Sq- -rres  does  not  lead  to  the  best 
available  estimates  of  inkncwn  parameters  when  the  law  of  error  is  other  than 
the  Gaussian,  if  the  number  of  independent  observations  available  is  much 
larger  than  the  number  of  paraneters  to  be  dete mined  the  Method  of  Least 
Squares  can  be  usually  counted  on  to  yield  nearly-best  estimates**. 

Friedrich  Gebhardt  (1964)  points  out  that  some  of  the  many  procedures 
that  have  been  proposed  for  handling  outlying  observations  are  based  on  statis¬ 
tics  with  the  optimum  property  of  minimizing,  for  certain  alternative  hypotheses, 
the  probability  of  the  error  of  the  second  kind  (accepting  the  null  hypothesis 


Hg  i in  it  is  false)  given  that  of  fbe  error  of  the  first  land  (rejecting  ^ 
ila  it  is  true)  ,  the  observations  that  are  not  rejected  being  tsed  to  estiaate 
imlmown  parameters,  e.g.  the  neae.  Be  considers  one  soch  procedure  ldrida  gives 
rise  to  a  one-panne  ter  family  of  estimators  fat  the  mean  acd  compares  their 
risks  with  those  of  the  Bares  solutions  with  respect  to  a  one-parameter  family 
of  prior  distributions. 

Peter  J.  fiber  (1964)  treats  is  detail  the  theory  of  robust  estimation  of 
a  location  parameter  of  a  contaminated  normal  distribution  with  c.d.f. 
F(t)*(l-c)#(t)*c  H(t)f  0<e<l,  idiere  e  is  a  known  saber,  4(t)  is  the  standard 
normal  c.<Lf. ,  and  H(t)  is  an  aikncwa  c.d.  f.  He  seeks  an  estimator,  inter¬ 
mediate  between,  the  sample  mean  and  the  saple  median,  that  is  -robust  against 


deviations  from  normality.  Let  x^,  x2»***  ^  1*  a  random  sample  of  size  n,  and 
let  the  estimator  be  chosen  so  as  to  minimize  pfx^-T), 

idiere  p  is  a  known  function.  If  we  take  p(t)*t2  we  get  the  usual  least-squares 
estimator,  the  sae^le  mean,  idiile  p(t)*jt|  yields  the  sample  median  and  p(t)« 
-log  f(t) ,  idiere  f  is  the  assured  density,  yields  the  likelihood  esti¬ 


mator.  fiber  shows  that  the  most  robust  estimator  (th.*  one  with  the  lowest 


suprenua  of  the  asymptotic  variance  whea  H  ranges  over  all  symmetric  distribu¬ 
tions)  corresponds  to  p(t)-t2/2  for  |tj<k,  p(t)«k|t|-k2/2  for  |tj>k,  where  k 

r  k  2  2 

and  e  are  related  by  i6w(l-e)«J  k  e_t  /2dt+(2 fk)e'k  /2.  He  also  considers 
robust  estimation  of  a  scale  parameter,  which  be  finds  mote  difficult  and  less 
satisfactory. 

K.  V.  Mardia  (1964)  obtains  the  exact  distributions  of  extremes,  ranges 
ar»d  midranges  in  s angles  from  any  multivariate  population.  He  lets  (x^ ^ ,  •  •  •  ,x^) , 

j*l,  *,n,be  a  random  s apple  from  a  k- variate  continuous  population  with 
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of  the 


p.d-f.f(x.,"* ptj.) ,  and  denotes  the  mnn  end 
i~  seriate  by  Xi  and  Y-  respectively,  the  range  by  R^C*  Yj-y  aid  the  i— 
midrange  by  X.}/2),  deie  i*l,  Jl.  He  finds  the  distributioos  of 

Yr"*,  Y^,  0^,'*',  ad  (Vj/***  \)»  idacn  reduce  to  the 
classical  fores  for  k*l. 

Hazy  G.  Httrella  (1964)  amyares  the  van  aces  of  the  tnique  medians  of 
sables  of  size  m  (for  odd  values  of  n)  with  those  of  the  pseudo-medians  (the 
averages  of  the  two  central  observations)  for  ■  even.  She  investigates  saaples 
free  normal,  rectangular,  and  extreme  value  distributions,  and  finds  that  no 
general  conclusion  is  possible  as  to  whether  it  is  better  to  take  m  odd  or  even. 

Albert  Stanley  Paulson  (1964)  gives  a  probability  basis  for  the  computation 
of  certain  measures  of  effectiveness  of  test  statistics  and  derives  analytical 
expressions  for  these  measures.  He  computes  these  measures  for  several  test 
statistics  for  the  rejection  of  outliers  a>«d  makes  comparisons  to  show  the  degree 
to  which  same  statistics  are  better  than  others.  In  particular,  ha  finds  that 
the  standardized  extreme  deviate  test  is  more  efficient  than  the  chi-square  test 
in  detecting  location  error  when  the  population  variance  is  known,  then  the 
population  variance  is  unknown,  he  finds  that  the  Quesenberry-Bavid  statistic 
using  a  pooled  estimate  of  the  population  standard  devi  ation  in  the  denominator 
is  more  efficient  than  the  studentized  extreme  deviate. 

E.  S.  Pearson  and  M.  A.  Stephens  (1964)  extend  the  table  of  percentage 
points  of  the  ratio  u*w/s  of  the  range  w  of  a  sanple  of  n  observations  from  a 
normal  population  having  standard  deviation  a  to  tue  root-mean-square  estimate 
s  of  o  derived  from  the  same  sample,  which  was  computed  by  David,  Hartley  6 
Pearson  (1954) ,  and  test  the  accuracy  of  the  approximation  used  by  those  authors . 


Alex  fcgqggd  (1964a, c)  establishes  results  on  the  limiting  independence 
of  wens  (qcantiles)  and  extras  nines  net  related  to  the  existence  of  liait* 
iag  distxibations  (retod  Uniting  distributions)  for  these  statistics. 

Rosengard  (1964b)  shews  tint,  when  tbs  variance  exists,  the  joist  distribution 
of  Ae  m  and  a  quwxtile  has,  as  its  limiting  fora,  a  specified  bivariate 
normal  distribution. 

Thomas  J.  Stoftaribeig,  Franklin  M.  Ejsher  and  C.  B.  Tilanus  (1964)  propose 
a  class  of  estiwators  of  the  center  of  the  Cauchy  distribution.  Each  estimator 
in  the  class  is  the  arithmetic  wean  of  a  central  sdhsst  of  thb  Stable  order 
statistics.  The  sample  median  is  a  eerier  of  this  Class,  but  it  is  not  the  aast 
efficient.  The  average  or  approximately  middle  quarter  of  the  OiJered  sample 
has  the  lowest  ^aptotic  variance. 

Asit  Prrkas  3asu  (1965)  proposes  setae  teste  for  outliers  in  the  case  of  the 
exponential  distribution  with  p.d.f.  f(x)«e~lX*w^8/e,x»y,  y>0,  e>0.  Mien  y 
and  e  are  known,  the  following  test  statistics,  whose  distributions  are  easily 
obtained,  way  be  used  to  test  whether  the  largest  value  x^  of  an  ordered  sample 
of  site  n  is  an  outlier:  BQ«  (x^-  y)/0,  Bj-vx^-  x^)/&,  B2*(xta-  x^j)/0.  Similar 
tests  can  be  devised  for  the  saallest  value  x^.  As  an  overall  test  of  the 
presence  of  outliers  one  ray  use  the  x  *  test,  since  aider  the  null  hypothesis 
2  £^(n-i+l)z^/e,  where  z^»  x^-  x^  and  Xq*  y,  follows  the  chi-square  distri¬ 
bution  with  2n  degrees  of  freedom.  When  y  and  6  are  not  known ;  one  can  use  the 
standardized  deviate  Un*  (x^-  xp/£(x^-  Xj),  whose  dis  Iribution  has  been  derived 
by  Laurent  (1P63). 

G.  P.  Bhar.tacharjee  (19C5)  investigates  the  effect  of  non-normality  an  the 
distribution  if  range  by  deriving  the  probability  integral  and  the  first  two 
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■amts  of  the  range  of  a  sample  dram  free,  a  population  represented  by  the 
fust  four  tens  of  m  Edgeworth  series.  He  then  caonrines  the  use  of  range 
in  place  of  root-nejn-squerc  deviation  as  an  estinator  of  the  standard  devia- 
tun  of  a  non-normal  population.  He  concludes  that  the  range  estimator  of  the 
population  standard  deviation  is  better  for  a  playku-tic  parent  population 
(and  worse  for  a  leptctaati.c  one)  t'un  the  corresponding  estinator  for  a  normal 
population  ,  tins  contradicting  an  erroneous  conclusion  reached  by  Gone  (1954). 

Peter  J.  Bidoel  (1965)  states  the  agin  results  of  the  asymptotic  theory 
of  the  Mlnsorized  and  triaaed  means  and  outlines  the  proof.  He  discusses  an 
alternative  ne -hod  of  trimring  and  Winscrization  (not  equivalent  to  that  of 
iUcey)  which  encompasses  the  efficient  estimates  proposed  by  Htfcer  and  general' 
ires  to  higher  dimensions.  He  gives  the  minima  efficiency,  with  respect  to 
the  families  of  all  symmetric  and  symmetric  uniaodal  distributions,  of 
Winsorized  and  triaaed  means  with  respect  to  the  mean.  He  compares  the  triaaed 
wean  and  the  Winsorized  mean  with  the  Hodges -lehaann  estimate  (the  median  of 
averages  of  pairs)  and  the  principal  estimate  proposed  by  Huber  with  the  mean 
and  the  Hodges-Iehaann  estimate.  He  concludes  that  although  all  the  proposed 
"nonparaaetric"  estimates  of  location  behave  satisfactorily  when  compared  with  the 
mean,  with  the  possible  exception  of  the  Winsorized  mean,  the  Hodges -Lehmann 
estimate  scans  to  be  the  "safest"  among  them. 

H.  A.  David  and  A.  S.  Paulson  (1965)  summarize  and  extend  the  results  given 
by  Paulson  (1964)  on  the  performance  of  several  tests  for  outliers,  ihey  note 
that  such  tests  "generally  have  one  of  the  folloving  aims:  (a)  to  screen  data 
in  routine  fashion  preparatory  to  analysis  (the  problem  of  'rejection  of  out¬ 
liers');  (b)  to  sound  an  alarm  that  outliers  are  present,  thus  indicating  the 
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need  for  closer  study  of  tie  data-geaemiflg  process;  (c)  to  pin-point  obser¬ 
vations  which  my  be  of  special  interest  just  became  they  are  extreme."  They 
do  not  treat  case  (a) ,  which  is  the  one  of  the  primary  interest  in  the  present 
study  jut  they  do  cite  sane  references  to  it. 

E.  J.  Guwbel,  P.  G.  Carlson  and  C.  X.  Mustafi  (1965)  prove  that  if  the 
initial  distribution  (parent  population)  i$  mlimited,  differentiable,  sym¬ 
metrical  and  unimrdal,  the  distribution  of  the  midrange,  for  any  saaple  size, 
is  also  uolimfed,  differentiable,  sysnetrical  and  xsiimsdal.  This  extends  a 
result  of  Guabel  (1944) . 

Shanti  S.  Gupta  and  Bhqpendra  K.  Shah  (1965)  derive  the  exact  expressions 

for  the  nonents  of  the  order  statistics  of  samples  of  size  n  from  a  standard 

2 

logistic  distribution  L(0,1),  where  L(y,e  )  has  the  c.d.f.  F(y;u,o)»  i/[l+exp 
{-((y-u)/ff]*(»/3^)>J.  They  tabulate  the  first  four  exact  moments  of  the  kn¬ 
ottier  statistic  X(k)  for  n*=l(l)10,.  k-l(l)n.  They  also  tabulate  percentage 
points  of  for  n-l(l)10,  k*l(l)n  and  for  n«ll(l)25,  k»l,  n  and  n/2,  (n+2)/2 
(n  even)  or  (n+l)/2(n  odd).  They  also  derive  expressions  in  closed  fora  for  the 
cumulative  distribution  function  and  the  density  function  of  the  range,  both  of 
which  they  tabulate  for  n*2,3.  Shah  (1965)  obtains  the  distributions  of  semirange 
and  midrange  of  sables  ffui  the  logistic  population. 

K.  V.  Mardia  (1965)  gives  alternative  proofs  of  the  formulas  of  Tippett 
(1925)  for  the  expected  values  E(R)  and  E[R-E(R)jm,  where  m  is  a  positive  inte¬ 
ger  and  R  is  the  range  of  a  saaple  of  size  n  iron  a  continuous  population.  These 
proofs  are  siapler  than  those  given  by  Tippett  and  other  authors,  and  hold  for 

all  n,  whereas  Tippett,  in  his  proof  of  the  latter  formula,  assured  n  to  be  even. 
Th«  author  also  finds  the  exact  value  of  the  varir.  xe  of  the  range  of  a  sample  of 


size  n*3  from  a  normal  population;  similar  results  for  n*2  and  n*4  were  given 
by  Ruben  (1356). 

Michael  E.  Tarter  and  Virginia  A.  dark  (1965)  show  that  the  cumulative 
distribution  function  (c.d.f.)  and  the  moment  generating  function  (m.g.f.)  of 
the  logistic  distribution  can  each  be  expressed  as  a  Madaurin  series  where 
the  coefficients  are  simple  functions  of  Bernoulli  numbers.  They  give  the  m.g.f. 
of  the  median,  and  determine  the  variance  of  the  median,  rlso  the  efficiency 
of  the  median  relative:  to  the  mean  for  various  sample  sizes,  as  well  as  its 
asymptotic  efficiency.,  They  also  give  the  variance  of  any  order  statistic  and 
the  covariance  of  any  two  order  statistics. 

Tukey  (1965)  studies!  the  informativeness  of  specific  order  statistics  or 
blocks  of  consecutive  order  statistics  in  a  sample. 

F.  J.  Ans combe  and  Brure  A.  Barron  (1966)  consider  a  particular  procedure 
for  rejecting  outliers  and  also  a  particular  procedure  Sor  modifying  outliers , 
for  samples  of  size  three  assumed  to  have  been  drawn  from  a  common  normal 
population,  except  that  one  of  the  three  readings  may  have  an  added  bias.  They 
give  numerical  results  illustrating  the  effects  cf  tho  procedures  on  estimation 
of  the  location  parameter.  They  conclude  that  estimation  by  least  squares 
should  usually  be  tempered  by  successive  application  of  both  a  rejection  rule 
and  a  modification  rule. 

R.  P.  Bland,  R,  D.  Gilbert,  C.  H.  Kapadia  and  D.  B.  Owen  (1966)  extend 
the  results  of  McKay  5  Pe  arson  (1933),  Lord  (1947),  Resnikoff  (1954),  Harter 
§  Clenni  (1959)  and  otner  authors  to  obtain  exact  results  for  additional  cases 
of  the  distributions  of  the  range  and  mean  range  for  samples  from  a  normal 
population. 
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Dorian  Feldaan  and  Howard  6.  Tucker  (1966)  study  consistent  estimates 
of  non-tnique  quantiles  of  a  distribution  function.  As  a  special  case  they 
consider  the  problem  of  median*,  especially  the  sanple  median  of  the  set  of 
averages  of  all(1^1)  pairs  of  observations  X^,  Xj,"*,  X^fthe  Hodges -Lehmann 
estimate  of  the  location  parameter].  They  prove  that  this  sasple  median  con¬ 
verges  almost  surely  to  the  center  medlar,  of  the  original  population,  provided 
that  the  original  distribution  is  symnsstiic  about  a  median ;  otherwise,  this 
saqple  median  of  averages  of  pairs  need  not  converge,  and  even  if  it  did  con¬ 
verge,  it  might  converge  to  a  number  which  is  not  a  median  of  the  parent 
distribution. 

Joseph  L.  Gastwirth  (1966)  discusses  a  procedure  for  finding  robust 
estimators,  based  on  robust  rank  tests,  of  tire  location  parameter  of  the  sym¬ 
metric  mimodal  distributions.  Not  only  can  the  Hodges -Lehmann  estimator  be 
constcucted  from  the  author's  procedure,  but  this  procedure  can  also  be  used 
to  generate  another  estimator  T,  which  is  the  best  linear  unbiased  estimator  of 
the  location  parameter.  The  best  linear  unbiased  estimator  corresponding  to 
the  least  favorable  distribution  of  Huber  (1964)  is  tha  trimmed  mean. 

Friedrich  Gebhardt  (1966)  relaxes  the  restriction  in  his  earlier  paper 
[Gebhardt  (1964)  ]  that  the  respective  variances  of  stragglers  and  non-stragglers 
be  known  by  requiring  oily  that  the  ratio  of  these  variances  be  known.  The 
results  for  a  variety  of  cases  which  he  studies  support  the  suggestion  of  Tukey 
(1962)  to  trim  the  sample  by  all  observations  that  deviate  substantially  from 
the  sample  mean  and  to  Winsorize  those  observations  that  deviate  moderately, 
but  trimming  exactly  two  observations  is  almost  always  a  better  strategy  than 
Winsorizing  two. 

J.  Like?  (1966)  finds  the  distributions  of  Dixon's  statistics  for  rejection 
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of  outliers  in  the  case  of  a  saqpie  from  an  exponential  population,  tabulates 
their  percentage  points,  and  gives  examples  of  their  use. 

Donald  T.  Sear  Is  (1966)  proves  that  *.f  a  Ninsorized  Bean  is  ftaned  by 
replacing  all  saaple  values  larger  than  a  predetermined  cutoff  point  t  by  the 
value  t  itself,  there  exists  a  region  for  t  for  certain  common  distributions 
such  that  tiie  Bean  square  error  of  the  Motorized  mean  is  smaller  than  the 
variance  of  the  ordinary  aean.  He  presents  an  example  which  shows  that  a  wide 
range  of  cutoff  points  can  be  chosen  which  still  result  in  a  gain. 

0.  B.  Sheynin  (1966)  contends  that  J.  H.  Lambert  should  be  given  preced¬ 
ence  over  Gauss  as  the  originator  of  the  theory  of  errors.  He  gives  a  concise 
sunnary  of  Lambert's  works  on  the  subject,  which  was  the  principal  source  of 
information  used  by  the  present  writer  concerning  these  works  [Lambert  (1760  r 
1765a ,b)]. 

Thomas  A.  Willke  (1966)  reports  the  results  of  a  sampling  study  of  the 
estimation  of  the  mean  an*  standard  deviation  from  the  closest  two  of  three 
observations  in  a  sample  from  a  normal  population  contaminated  by  slippage  of 
the  mean.  The  results  of  Lieblein  (1952) ,  which  indicated  that  use  of  the 
closest  two  out  of  three  is  not  advisable  for  noncontaminated  samples,  are 
borne  out  by  this  study  for  contaminated  samples  as  well. 

J.  N.  Adichie  (1967)  defines  point  estimates  a  and  a  of  the  parameters 
a  and  B  in  the  linear  regression  equation  Y-a+8X  in  terms  of  certain 
statistics  used  to  test  hypotheses  concerning  a  and  a.  He  shows  that  the 
least  squares  estimates  are  a  special  case  of  thvse  estimates.  He  proves  tliat 
"rank  score"  estimates  exist  and  shows  how  compute  them.  He  discusses  both 
the  small-sample  and  asynptotic  properties  of  these  estimates.  He  shows  that 
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a  and  %  are  mbiasod  if  the  mderlyiqg  distribution  of  observations  is  sym¬ 
metric.  He  also  proves  that  a  and  t  are  jointly  asymptotically  normal,  and 
that  tiie  asymptotic  efficiency  of  (8,$)  is  the  sane  as  the  Pitasn  efficiency 
of  tiie  rani  tests  on  which  they  are  based,  relative  to  the  classical  tests. 
Finally  he  conpares  the  efficiencies  of  these  estimates  «ok!  the  Brown-Mood 
median  estimates. 

F.  J.  Ansccmbe  (1967)  reviews  various  topics  relevant  to  the  present  study, 
including  the  effect  of  nodern  computers  on  statistical  calculation,  "stepwise 
regression",  testing  goodness  of  fit  by  examining  residuals,  and  possible  al¬ 
ternatives  to  the  method  of  least  squares  appropriate  whan  the  distribution  of 
errors  has  long  tails.  He  points  out  that  Laplace  and  Gauss  justified  using 
the  method  of  least  squares  and  restricting  attention  to  linear  combinations 
of  the  observations  largely  on  the  basis  of  computational  simplicity  and 
feasibility.  In  the  age  of  computers  this  justification  is  no  longer  valid. 
Suppose  one  has  n  sets  of  observations  on  a  dependent  variable  and  p  independ¬ 
ent  variables,  denoted  respectively  by  y^  and  x-r  (i-1,2,*’*,  n;  r»l,2,***,p). 

If  it  is  assumed  that  the  true  y  is  a  linear  function  of  the  x's,  one  can  set 
yf  *irer  +  the  e's  are  independent  with  zero  mean, 

so  that  jij  ■  x^Br,  where  ^  is  the  expected  value  of  y^.  The  problem  is 
then  to  estimate  the  parameters  Br  as  precisely  as  possible .  In  the  classical 
theory  it  is  assumed  that  the  e's  are  normally  distributed,  so  that  the  method 
of  least  squares  is  optimal.  Anscombe  points  out  that  it  is  possible  to  fit 
the  0's  by  stages  (e.g. ,  one  at  a  time)  and  advocates  testing  goodness  of  fit 
by  examining  the  residuals.  He  then  raises  the  question  as  to  what  should  be 
done  if  the  distribution  cf  the  e's  is  not  normal.  If  the  distribution  is 
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skated,  be  suggests  that  it  may  be  lynlrlrwd  by  transforming  the  y-scale, 
usually  by  raising  each  y  to  some  fixed  potter  (or  by  taking  logarithms);  he 
points  out,  hoover,  that  such  a  transformation  has  other  consequences  which 
may  or  aay  not  be  desirable-  If  the  distribution  of  the  e*s  is  symmetric  but 
platykurtic  (shorter-tailed  than  the  normal)  or  leptokurtic  (longer  tailed 
than  the  noraal) ,  a  different;  remedy  is  required,  involving  departure  from 
tiie  method  of  least  squares.  Ansccabe  studies  in  detail  the  case  of  a  long- 
tailed  distribution  of  errors.  If  ones  wishes  only  to  find  estinates  of  the 
Vs  without  any  indication  of  their  precision,  he  suggests  minimizing,  instead 
of  Lfo-aJ  .  Lffo-vO,  where  *(•)  is  the  square  function  for  snail  values 
of  the  argtraent  but  increases  less  rapidly  for  larger  values  and  is  constant 
for  very  lartoe  values.  Specifically,  he  suggests  choosing  as  the  estinates 
(fSp  the  values  of  ($r)  that  minimize  + 

^(3)Kl(2K2“V»'*er8  *2  aro  chosen  maabers  CK2>Kx>0)»I(i)  denotes 

summation  over  the  values  of  i  such  that  |y^-  n- UKpJ^)  denotes  sunsnation 
over  the  values  of  i  such  that  K^<  j  y^-  I^,  and  denotes  summation 

over  the  remaining  values  such  that  jy.-  Pil>K2.  If,  on  the  other  hand,  one 
desires  evidence  from  the  data  concerning  the  precisian  of  the  estimates 
(fp ,  it  is  necessary  to  make  some  assumption  about  the  true  distribution  of 
the  c's.  Let  the  error  density  function  be  represented  by  f(e|a,o),  where 
typically  a  is  a  shape  parameter  and  a  is  a  scale  parameter.  He  notes  that 
apparently  the  only  kind  of  density  f  (e|a,o)  that  permits  easy  integration 
with  re'.pect  to  a  in  closed  form  is  f(e|o,o)"(aa/o)  exp  (-cJe/o|a),  where 
aa  and  cq  are  functions  of  a.  When  a**2  we  have  normality,  and  when  1«*<2  we 
have  a  smooth  function  of  s  with  longer  tails  than  the  normal  density.  When 


*•1  lie  here  the  double  exponential  (Laplace’s  first)  distribution,  cmcmmiTtg 
which  Jeffreys  (1939)  (2nd  ed.  (194$),  Sec.  4.4]  hes  canented:  ‘The  interest 
•if  the  lew  is  reduced  sourtat  by  the  feet  that  time  do  not  appear  to  be  any 
:ases  where  it  is  true."  Anscoabe  expresses  the  opinion  that  the  sane  renark 
can  be  node  about  the  distribution  when  #*1.5  (say).  He  advocates  instead 
Jeffreys’  foam  of  the  Pearson  Type  VII  error  distribution,  with  density  func¬ 
tion  f  (e  |n,o)*,(a^/^J»o)  (l*cne2/a2) 7“  where  cB»  u2/2(mrl/2 )3,  y 
r(n)/r(a-l/2) ,  which  approaches  noreality  as  *♦*,  for  sane  appropriate  value 
of  a.  Anscombe  suggests  m*4,  for  which  the  Type  VII  distribution  is  equivalent 
to  the  Student  t  distribution  with  7  degrees  of  freedan.  He  proceeds  to  in¬ 
vestigate  the  likelihood  function  of  the  Type  VII  distribution.  He  points 
out  that  maximizing  this  likelihood  function  is  rather  like  minimizing  the 
“Pression  1(1)  *  1(2)  +  1(3)  *ntia*d  abovo‘  F»r  the  Type  VII  distribution, 
b>1/2;  however,  for  negative  valves  of  a,  the  sane  density  function  gives  a 
Type  II  distribution  over  a  finite  interval  (a  distribution  with  shorter  tails 
than  th.  noraal) . 

Allan  Bimbaura  and  Eugene  M.  Laska  (1967a, b)  present  a  general  method  of 
determining  efficiency-robust  estimation  methods,  which  they  use  to  derive 
admissible  linear  unbiased  estimators  (whose  respective  variances,  over  a 
specified  family  of  symmetric  shapes  of  the  error  distribution,  cannot  be 
jointly  improved)  and  maximin-efficient  linear  unbiased  estimators  (which 
maximize  the  minimum  asymptotic  efficiency,  within  a  class  of  estimators,  for 
a  family  of  densities). 

Edwin  L.  Crow  and  M.  M.  Siddiqui  (1967)  derive  robust  estimators  of  the 
location  parameter  which  are  efficient  over  a  class  of  two  or  more  forms 
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(pencils)  of  continuous  symmetric  tnTardal  distributions.  The  pencils  con¬ 
sidered  are  the  normal,  doddle  exponential,  Cauchy,  parabolic,  triangular, 
and  rectangular.  The  estimators  considered  are  trffxl  scans,  lKnsoiized 
seans,  "linearly  weighted"  seans,  and  a  combination  of  the  median  and  two  other 
order  statistics.  Asymptotically  these  are  compared  with  the  Hodges-fatosann 
estimator.  The  best  trisMd  scan  or  linearly  weighted  scan  has  an  asymptotic 
efficiency  of  at  least  0.82,  relative  to  the  best  estimator  for  any  single 
pencil,  over  a  range  of  pencils  of  distributions  from  the  normal  to  the  Cauchy, 
while  the  cmoinaMcn  of  the  median  and  two  other  order  statistics  is  at  least 
0.80  efficient  over  the  same  range. 

Hodges  (1967)  makes  a  further  study  of  the  Hodges -Lehmann  estimate  T, 
which  he  recognizes  as  a  member  of  a  class  of  estimates.  He  explores  this 
class  for  other  members  which  are  easier  to  compute,  and  finds  that  one  of  the 
simplest  of  these,  D,  which  is  defined  as  the  median  of  the  means  of  pairs  of 
symmetric  order  statistics,  corresponds  to  the  one-sample  analog  of  Gal  ton's 
rank-order  test.  He  applies  D  to  the  same  samples  used  with  T  and  obtains  very 
similar  results.  Finally,  he  compares  a  masher  of  estimates,  including  X  (the 
mean) ,  T,D,  and  the  trimmed  and  Wtnsorized  means  with  regard  to  normal  effi¬ 
ciency,  ease  of  amputation,  and  extreme  value  tolerance.  Bickel  and  Hodges 
(1967)  derive  the  asymptotic  theory  of  Gal ton’s  test  and  the  related  estimate 
D,  which  gives  an  explicit  form  to  the  limiting  distribution  of  D  only  for 
rectangular  and  Laplace  parent  populations.  Although  the  limit  is  not  normal, 
they  conclude  that  the  scatter  of  D  is  quite  close  to  that  of  T.  Finally, 
they  give  the  3nall -sample  distribution  of  D  for  a  rectangular  parent.  Although 
their  evidence  is  incomplete,  they  conclude  that  D  is  robust  as  well  as  easy  to 
compute. 
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distribution  satisfying  suitable  leplaritjr  caaditioBS,  at  apprcarinatim  to 
lie  variance  of  the  wt&imxJLvp  to  teas  of  cider  1/e*  end  a  corresponding 
appwri— tion  to  the  efficiency  of  X  relative  to  the  nean  X  op  to  tens  of  order 
1/n.  For  nonal  and  lectaylg  distributions,  these  give  a  axh  closer  ajpprox- 
nation  to  the  exact  efficiency  than  does  the  usual  asymptotic  efficiency.  The 
authors  paint  out  that,  to  the  acosacy  of  their  approorinetioa,  one  should  not 
use  the  nedian  based  on  an  odd  nuri>er  r*£  observations  since  the  nedian  based 
an  the  next  snaller  even  ncnfcer  is  equally  accurate.  They  extend  their  results 
to  other  averages  of  two  syetrically  placed  order  statistics  (quasi-aedians) , 
thus  —king  possible,  in  sane  cases,  a  further  reduction  in  saaple  size  without 
loss  of  accuracy. 

Robert  V.  Hogg  (1967)  rinds  an  estimator  T,  which  is  a  weighted  nean  of 
Tp  T2,*“,  I.,  where  T.  is  a  reasonable  estimator  (e.g.,  a  winiwii  nean  square 
error  estinator  of  the  paraneter  0  of  the  fanily  D.  of  distributions,  j»l,2, 

*'*,  *),  such  that  T  has  the  sane  asymptotic  distributien  as  that  of  T-,  when 
tbs  saqple  cones  fron  D^.  The  weights  axe  functions  cf  the  s*n>le  items.  The 
author  gives  enpirical  evidence  that  T  is  satisfactory  for  saali  sample  sizes. 

He  proves  taa*  if  and  the  weight  W.  are  odd  location  and  even  location-free 
statistics,  respectively,  then  T*[  W/T ,  where  £  1,  is  an  unbiased  esti¬ 

mators  of  the  center  of  every  symetric  distribution,  provided  certain  expec¬ 
tations  exist.  This  fact  is  useful  in  constructing  the  weight  function  Wj .  The 
particular  T  which  the  author  investigates  enpirical ly  is  given  by  for 

k<2.9,  T»X  for  2.0<k<4.0,  THf^  for  4.0<k$5.5,  for  5.5  k,  where  is  the 
nean  of  the  [n/4]  smallest  and  the  [n/4]  largest  items  in  the  sample  (an  interior- 
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trimmed  mean) ,  Yjyj  is  the  mem  of  the  remaining  interior  sample  items  (m 
exterior- tri—d  mb),  Y  is  the  staple  mean,  ■  is  the  smple  median,  md 
k*£(x^-  5)*/hs*  is  the  settle  kurtosis.  He  compares  the  ratio  of  the  variances 
of  Y,*,M(the  Lodges-Lf  hmam  estimate)  and  T  for  200  samples  each  of  sizes  7  and 
2S  froa  four  populations  with  kurtosis  1*1. 9,2. 7, 3. 9, 9.9.  For  both  staple 
sizes ,  T  has  the  saailest  variance  for  1-1.9,  Y  for  1*2.7,  and  a  for  K*3.9,9.9. 
Hie  seen  performs  aery  poorly  for  distributions  with  long  tails  (higfr  K) ,  the 
median  performs  rather  poorly  for  those  with  short  tails  (low  K) ,  and  the 
overall  performance  of  both  M  and  T  is  good. 

Fred  C.  leone,  Toke  Jayachandran  tad  Stanley  Eisenstat  (1967)  report  on 

an  empirical  study  of  the  performance  of  the  sample  mean  and  the  Hodges -Lehmann 

and  Huber  estimators  of  the  location  parameter  when  applied  to  ccntasinated 

distributions.  They  verify  Huber's  statement  that  his  estimator  T  and  the 

Hodges-Lehmann  estimator  are  close  competitors  and  his  conjecture  that,  for  a 

sample  of  size  n,  the  distribution  of  the  ratio  of  /n  times  his  estimator  T  of 

the  location  parameter  to  the  estimator  S  of  the  scale  parameter  can  be  approx- 

1/2 

mated  by  a  Student  t-distribution.  They  compare  the  sample  variance  of  n  '  • 

T  with  its  maximal  asymptotic  variance  as  given  by  Huber.  They  also  verify, 
by  means  of  a  chi-square  test  of  goodness  of  fit,  that  the  distributions  of 
the  Hodges-Lehmann  estimator  and  of  Huber's  T  can  be  approximated  fairly  well 
by  appropriate  noraal  distributions  when  n>20. 

Max  Ray  Mi  eke/,  Olive  Jean  Dunn  and  Virginia  A.  Clark  (1967)  point  out 
that  the  exandnaticn  of  residuals  is  not  always  sufficient  to  identify  outliers 
in  a  regression  raodol,  and  propose  stepwise  regression.  Their  procedure  finds 
the  single  observation  whose  deletion  causes  the  greatest  reduction  in  the  sum 
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of  squared  nsiduls,  tw  rcpocts  the  process  on  the  radaioi  dots.  The 
selection  of  stopping  rules  is  left  open. 

K.  N.  Siddiqtri  and  K.  aagfrcm endawn  (1967)  supplement  the  paper  of  Crow 
mad  Siddiqcd  (1967)  by  e  study  of  the  robustness  properties  of  four  estimators 
of  location  (e  weighted  wrong*  of  the  median  and  two  other  sy etric  order 
statistics,  the  trised  mean,  the  Kinsorized  mean,  and  the  Hodges-Lehmmnn  esti- 
nator)  with  respect  to  eight  distribution  types  (normal,  .01  and  .05  contaminated 
normal,  logistic.  Student's  t  with  3  and  5  degrees  of  freedom,  double  exponen¬ 
tial,  and  Cauchy) .  For  each  of  these  types  the  probability  density  function  is 
continuous  and  symmetric  about  the  mean  and  the  range  is  infinite.  The  estimator 
with  tiie  highest  guaranteed  efficiency  for  the  entire  class  of  distributions  is 
the  mean  of  the  middle  501  of  the  sample.  The  authors  state  that  the  Hodges - 
T**Tnp  estimator  was  first  suggested  by  Tukey,  but  the  present  writer  finds  no 
evidence  of  this  in  the  source  cited,  where  Ifckey  does  use  the  averages  of  all 
pairs  of  observations,  which  he  calls  Walsh  averages,  but  does  not  suggest  using 
their  median  as  an  estimator  of  the  location  parameter. 

Chatter  Singh  (1967)  obtains  expressions  for  the  raw  moments  and  the  prob¬ 
ability  integral  of  the  largest  (smallest)  value  and  for  the  first  two  moments 
of  the  range  of  samples  from  non-normal  populations  represented  by  the  first 
four  terms  of  the  Edgeworth  series.  He  gives  some  conclusions  about  the  nature 
of  the  effects  of  parental  skewness  and  kurtosis.  The  mean  largest  (smallest) 
value  is  sensitive  to  parental  skewness  but  not  to  kurtosis  in  small  samples. 
However,  parental  kurtosis  tends  to  increase  or  decrease  the  variance  depending 
upon  whether  -3  is  positive  or  negative.  The  mean  range  is  quite  insensi¬ 

tive  to  population  changes  in  small  samples ,  but  both,  skewness  and  kurtosis  have 
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a  greater  effect  an  the  variance  of  the  range.  The  author  compares  his  results 

with  those  of  Pearson  (1550),  Com  (1954),  Hand  (1954)  and  others. 

G.  C.  Xiao  and  Irwin  Sitfan  (1967)  point  out  that  a  najor  difficulty 

involved  in  statistical  procedures,  designed  to  guard  against  the  occurrence 

of  outliers  or  spurious  observations,  which  are  based  tqxn  examining  the 

magnitude  of  the  residuals,  is  caused  by  the  fact  that  the  residuals  are 

correlated.  They  show  how  to  avoid  this  difficulty  by  adjusting  the  residuals 

on  the  basis  of  information  fro*  an  auxiliary  experiment  so  that  the  adjusted 

residuals  become  uncorrelated.  This  leads  to  a  set  of  estimation  procedures, 

9  7 

fat  the  unknown  mean  of  a  normal  papulation  N(u,c~)  with  known  o,  in  which  one 
or  more  observations  for  which  the  magnitudes  of  the  adjusted  residuals  are 
largest  will  be  excluded.  The  authors  discuss  certain  properties  of  these 
procedures,  give  exact  nuaerical  results  for  the  cases  of  one  and  two  spurious 
observations,  and  generalize  to  the  case  of  unknown  variance. 

G.  I.  Bhfitt&charyya  (1968)  obtains  median  and  weighted  median  estimate*, 
for  the  linear  trend  parameters  of  a  univariate  time  series  by  applying  the 
Hodges-I^hnarai  method  to  scan  well-known  nonparametric  tests  for  trend.  He 
extols  the  estimation  procedure  to  the  multivariate  trend  model,  and  studies 
its  asymptotic  efficiency  properties  relative  to  the  classical  estimates. 

G.  E.  P.  Box  and  G.  C.  Tiao  (1968)  consider  the  problem  of  outlying  obser¬ 
vations  from  a  Bayesian  viewpoint.  They  assume  that  each  observation  in  an 

experiment  may  come  from  either  a  "good"  run  or  a  "bad"  run.  They  specify  the 

2  2  2 

mtdels  corresponding  to  good  and  bad  runs  [N(y,o  )  and  N(y,k  a  ),  respectively] 
and  the  prior  probability  a  that  a  run  is  bad,  and  employ  standard  Bayesian 
inference  procedures  to  derive  the  appropriate  analysis.  They  give  an  example 


of  the  application  of  thair  method  to  actual  data,  and  aamrime  the  rwitivity 
of  the  results  tc  chances  in  a  and  k. 

Irving  Finite  Burr  and  Fetor  J-.  Cislal  (1968)  five,  in  closed  foot,  the 
density  function  of  the  median  for  odd  sized  saa pies  €ron  the  Burr  system  of 
distributions  [  which  has  c.d.f.  F(x)-  l-(l*aft'k,  x*0,  c,  k>0;  F(x)»  0,  x<0 
and  covers  alaost  all  of  the  regions  of  the  aein  Pearson  Types  IV  and  VI  and 
an  important  part  of  that  of  the  aein  type  I  (Beta  distribution)].  All  finite 
noneats  of  the  median  x  are  linear  combinations  of  Beta  flections.  For  samples 
of  size  n-3,5,7  and  11  from  Burr  populations  with  Oj.x-  0,.50,  1.00,  1.50  and, 
corresponding  to  each  e3#x,  two  well  separated  values  of  c^.x,  the  authors 
tabulate  the  following  important  characteristics  of  the  aedian:  Bias,  o^,  e3.j, 
**4"K  efficiency  relative  to  the  sample  no  an.  They  point  out  that  it 
appears  that,  for  this  system,  the  median  begins  to  be  more  efficient  than  the 
nean  at  about  the  degree  of  non-normality  of  the  exponential  distribution.  Burr 
(1968}  tabulates,  for  saaples  of  size  1^2,3,4,5,8,10  from  populations  of  the 
Burr  system  with  27  different  combinations  of  «3.x  and  a^.x,  the  following 
characteristics  for  the  distribution  of  range  R:  standardized  moan  and  standard 
deviation,  a3.R,  «^.R  and  coefficient  of  variation.  He  confirms  that  the 
standardized  mean  range  is  highly  stable  for  fixed  n  under  varying  non-normal¬ 
ity  ,as  has  been  pointed  out  in  the  literature,  and  finds  that  the  same  is  true 
of  the  standardized  standard  deviation  for  the  populations  studied,  w’fich  have 
a.  >  2.87.  He  also  finds  evidence  that  the  range  is  senewhat  vote  robust  and 
efficient  than  hitherto  noted. 

Theophilos  Cacoullos  (1968)  presents  a  sequential  schema  for  detecting 

2 

outliers  in  a  staple  from  a  p-variate  normal  population  N(u,e  I)  in  which 
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the  romp  nw  ft  ere  indepamdwtly  jwnwlly  distributed  with  the  saw  variance 
o2.  At  the  k^  stap  ef  atyavinectation,  Uhm  the  first  k  observations  are 
available,  the  most  recant  observation  ij,  is  rejected  as  an  outlier  if  and 
only  if  |x^-  \\2>  CBa2  then  o2  is  know  or  |x^-  SjJ2>  ca^  when  both  v  and 
o2  are  unknow,  where  y  jJ.j  x./k,  s£«  |x--  Xj,|2/p(k-l)Jxj  denotes 

tiie  length  of  x  and  c#  denotes  the  tapper  a  point  of  the  x~  distribution  with 
p  degrees  of  freedom.  The  author  considers  the  sequential  stopping  role 
which  stops  taking  observations  as  soon  as  the  first  outlier  is  rejected.  He 
proves  that  this  scheme  terminates  with  probability  one  after  a  finite  nueber 
of  steps  and  that  the  aaaber  of  observations  ft  has  aoaents  of  every  order.  The 
probability  P,  that  the  observation  is  rejected  appears  to  be  aonotone 
increasing  for  reasonable  critical  values  of  c  and  approaches  r  as  k-*«. 

D.  R.  Cox  and  E.  J.  Snell  (1968)  emaaerate  the  following  types  of  depart¬ 
ure,  frea  the  usual  linear  regression  model  of  one  independent  variable  on 
n  independent  variables  with  errors  c  normally  distributed  with  zero  mean  and 
constant  variance,  which  can  be  detected  by  an  appropriate  analysis  of  resi¬ 
duals:  (1)  the  presence  of  outliers;  (2)  the  relevance  of  a  factor  emitted 
from  the  aodel,  detected  by  plotting  the  residuals  against  the  levels  of  that 
factor;  (3)  non- lines'*  regression  on  a  factor  already  included  in  the  aodel, 
detected  by  plotting  the  residuals  against  the  levels  of  that  factor  and 
obtaining  a  curved  relationship;  (4)  correlation  between  different  c.'s,  for 
exaaple  between  t^'s  adjacent  in  tine,  detected  from  scatter  diagrams  of 
suitable  pairs  of  residuals,  or  possibly  from  a  periodogram  analysis  of 
residuals;  (5)  non-constancy  of  variance,  detected  by  plotting  residuals  or 
squared  residuals  against  factors  thought  to  affect  the  variance,  or  against 
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fitted  values;  (6)  non-normality  of  tbs  distribution  of  the  e^'s,  detected 
by  plotting  the  order’d  residuals  against  the  expected  values  of  the  order 
statistics  fraa  a  standard  normal  distribution.  The  authors  give  a  more 
general  definition  of  residuals  and  find  some  asysq>totic  properties.  They 
also  discuss  son*  illustrative  exaqdes,  including  a  regression  problem  in¬ 
volving  exponentially  distributed  errors. 

D.  R.  Cooc  (1968)  makes  miscellaneous  coaaents  on  various  aspects  of 
regression  analysis,  including  outliers  and  robust  estimation.  He  points  out 
taat  screening  of  data  for  suspect  observations  will  often  be  required.  Suspect 
values  nay  be  examined  individually  in  order  to  decide  whether  or  not  to  in¬ 
clude  then  in  any  subsequent  analysis;  this  is  the  usual  procedure  with  limited 
data.  Often  it  is  necessary  to  perform  analyses  both  with  and  without  suspect 
values.  When  p  observations  are  available  for  each  individual,  the  best  way 
of  looking  for  outliers  will  depend  on  the  type  of  effect  expected,. of  which 
the  author  discusses  three.  With  extensive  data,  he  suggests  the  use  of 
methods  of  robust  estimation,  such  that  proposed  by  Huber  (1964) ,  which 
are  insensitive  to  outliers. 

0.  R.  Cox  and  D.  V.  Hinkley  (1969)  consider  a  linear  regression  model  in 
which  the  errors  are  independent  and  identically  distributed  with  zero  mean. 

If  the  type  of  error  distribution  is  specified,  the  asymptotic  efficiency  of 
least-squares  estimates  relative  to  maximum- likelihood  estimates  of  the 
regression  parameters  can  be  found,  and  the  authors  calculate  it  explicitly 
for  an  Bdgeworth  series,  for  a  Pears  cm  Type  VII  distribution  [suggested  by 
Anscombe  (1967)]  and  for  a  log  Ganna  distribution  of  errors. 

A.  5.  C.  Ehrenberg  (1968'  quotes  various  authors  on  the  subject  of  the 
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justification  for  adopting  the  least-squares  approach,  to  regression  analysis. 
tin  says  it  appe&cs  to  be  a  case  of  the  practical  nan  accepting  the  theoreti¬ 
cians'  judgement  that  it  will  give  the  "best"  solution,  and  the  latter  assuring 
the  "best  fit"  is  what  the  fbmer  wants. 

Prank  Rudolf  Hopei  (1968)  studies  the  problem  of  robust  estimation.  In 
the  one-dimensional  case,  m  finds  the  optimal  solutions  (with  regard  to 
asymptotic  variance  and  sensitivity)  for  tl»  class  of  (sufficiently  regular) 
M-estimators  as  defined  by  Huber.  The  optimal  estimators  of  the  location  param¬ 
eter  in  the  model  of  normality  turn  out  to  be  the  Huber  estimators  and  the 
trimaed  means. 

J.  Ar  Hart.lgan  (1968)  defines  a  Bayes  measure  of  discordance  of  an  obser¬ 
vation  x,  given  a  set  of  observations  x^  Xj,  x^,  to  be  the  distance 
between  the  posterior  distributions  of  a  parameter,  in  the  presence  or  absence 
of  x.  He  also  proposes  a  measure  of  dissimilarity  between  two  observations. 

When  the  timber  of  observai  *  jns  is  large,  these  two  measures  may  be  approximated 
by  simple  functions  of  the  log  Likelihood,  thereby  avoiding  dependence  on  prior 
distributions. 

Ami  jot  Hflyland  (1968)  studies  the  behavior  of  the  Hodges -Lehmann  estimator 

of  location  9*  and  the  classical  estimator  3 (the  arithmetic  mean)  in  a  situation 

where  the  data  occur  naturally  grouped  in  n  blocks,  c  observations  per  block, 

with  the  experimental  conditions  varying  from  block  to  block,  thus  invalidating 

the  standard  assumption  that  the  observations  are  independent  and  identically 

distributed.  In  particular,  he  studies  the  asymptotic  efficiency,  as  nr*  with 

c  fixed,  of  e*  relative  to  £  for  normal  and  gross  error  models.  The  relative 

efficiency  is  less  than  1  for  the  former  and  greater  than  1  for  the  latter  in 
all  cases  studied. 
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Peter  J.  Hate.  (1568} ,  in  e  survey  paper  on  robust  estimation,  begins 
by  attacking  tbe  dogma  of  noraallty  and  the  associated  rule  of  the  arithmetic 
■ean.  He  points  out  that  easy  mathematicians  of  the  nineteenth  and  early 
tMectieth  centuries  realized  that  they  mere  not  universally  valid,  but  in  most 
cases  continued  to  behave  as  if  they  were,  not  realizing  hoe  bad  the  classical 
estimates  could  be  in  slightly  non-normal  situations.  The  turning  point  did  not 
cane  until  after  World  War  II,  when  Tukey  and  his  associates  began  to  emphasize 
tiie  shortcomings  of  the  classical  estimates  and  propose  practicable  alternatives 
to  them.  Huber  defines  what  he  means  by  robust  estimators  and  enwerates  four 
distinct  goals  to  be  achieved  by  them.  He  gives  three  methods  of  constructing 
robust  estimates:  (1)  maxima  likelihood;  (2)  linear  combinations  of  order 
statistics;  (3)  estimates  based  on  rank  tests.  He  introduces  the  idea  of  asy¬ 
mptotic  robustness,  and  attests  to  answer  criticisms  that  have  been  levelled 
at  asymptotic  theory,  restriction  to  symmetric  distributions  and  minimax  theory, 
all  of  which  play  uqsortant  roles  in  robust  estimation.  He  also  considers  the 
question  of  ease  of  computation,  pointing  out  that  the  Hodges -Lehmann  estimate, 
for  a  sample  of  size  n,  requires  0(n  1  operations ,  as  compared  with  approxi¬ 
mately  G(a  log  n)  for  Hodges'  alternative,  Huber's  estimate,  and  the  trinraed 
and  Winsorized  means.  He  points  out,  however,  that  Hodges’  alternative  esti¬ 
mate  is  not  asymptotically  normal.  He  closes  by  mentioning  some  other  problems 
(largely  unsolved)  in  robust  estimation:  (1;  estimation  of  scale  parameters ; 

(2)  estimation  of  location  parameters  in  the  multivariate  case;  (3)  estimation 
in  the  absence  of  translation  and  scale  invariance;  and  (4)  regression  and 
analysis  of  variance  problems. 

Masao  Kogure  and  Hajime  Makabe  (1968)  study  the  non-central  distribution 


of  the  standardized  range  and  give  an  application  to  a  process  capability 
study  through  a  control  chart  with  trend  line. 

P.  Prescott  (1968)  suggests  a  simple  estimator  of  the  standard  deviation 
of  a  normal  population  as  an  alternative  to  the  usual  root-mean-square  esti¬ 
mator.  The  proposed  estimator,  which  is  one-third  of  the  difference  between 
the  means  of  the  largest  one-sixth  and  the  smallest  one-sixth  of  the  observa¬ 
tions,  has  an  asyaptotic  efficiency  of  0.9S6. 

Pranab  Kiser  Sen  (1968a)  studies  the  robust-efficiency  of  the  Hodges - 
Lehmann  estimator  when  the  n  observations  are  drawn  from  distributions  which 
are  symmetric  about  their  medians  and  have  continuous  c.d.f.'s  F^Cx) * ,  Fn(x) 
which  are  not  necessarily  identical.  Sen  (1968b)  studies  a  simple  robust 
unbiased  estimator  of  th.?  regression  coefficient  g  based  on  Kendall's  rank 
correlation  coefficient  tau.  The  estimator  is  the  median  of  the  set  of  slopes 
Oj-  Y^/CXj-  X.)  joining  pairs  of  points  with  X ^ .  Sen  compares  its  pro¬ 
perties  with  those  of  the  least  squares  estimator  and  some  other  nonparametric 
estimators. 

L.  vie  Haan  and  J.  Th.  Runenburg  (1969)  study  the  distribution  of  the 

quotient  h^  Vl/(x2n.r  xl>  of  *•»  sa"*le  "ledian  **  ^  sa»le  ran*e  f°r 
a  sample  of  size  2n+l  frcat  a  standard  normal  distribution.  They  give  the  c.d.f. 
of  hj  and  the  p.d.f.  of  and  show  that  is  asyD^potically  normal.  N.  Bouna 
and  A.Vehmeyer  (1969)  tabulate  the  percentiles  and  the  second  and  fourth  moments 
(when  they  exist)  for  n«l,2,3  and  4  (sample  sizes  2nKL«3,5,7  and  9).  Their 
tabulation  is  based  partly  on  the  theoretical  results  of  de  Haan  and  Runnenburg 
and  partly  on  numerical  results  obtained  by  Monte  Carlo  methods.  They  approxi- 
mat.  the  distribution  of  ^  by  a  Student  t  distribution  with  V  2E2{r2)/o2(r2) 
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Agrees  of  freedom. 

M.  Mahanunulu  Desu  and  Robert  H.  Rodine  (1969)  point  out  that  the  sample 
median  is  a  median  imbiascd  estimator  of  the  median  of  a  continuous  population 
ftr  odd  sample  sizes,  but  not  necessarily  for  even  sample  sizes.  For  symmetric 
populations,  however,  the  median  of  a  sample  of  even  size,  as  an  estimator  of 
the  population  median  5,  is  median  unbiased  and  also  unbiased  in  the  usual 
sense,  and  there  is  a  class  of  unbiased  and  median  unbiased  estimators  of  £ 
whic\  includes  the  sample  median  and  the  saaple  midrange.  The  authors  give  an 
estimator,  using  a  random  selection  of  pairs  of  symmetrically  placed  order 
statistics,  Yy  end  Yn  j,  which  is  a  median  unbiased  estimator  for  any  popu¬ 
lation  and  unbiased  for  symmetric  populations. 

James  John  Fillibon  (1969)  examines  the  behavior  of  various  linear  esti¬ 
mators  of  location  when  the  underlying  distribution  is  known  (simple  estimation) 
and  when  it  is  not  known,  but  is  known  to  belong  to  a  prespedLfied  set  S  (robust 
estimation) .  The  estimators  considered  include  best  linear  unbiased  estimators 
and  various  modifications  thereof,  as  well  as  trimmed  and  Winsorized  means. 

The  set  0  consists  of  34  synmetric  unimodal  distributions;  optimal  linear 
robust  estimators  are  found  for  S  and  various  subsets  of  S. 

Joseph  L.  Gastwirth  and  Herman  Rubin  (1969)  consider  the  problem  of  finding, 
for  the  location  parameters  of  synmetric  unimodal  distributions ,  robust  esti¬ 
mators  which  are  linear  functions  of  the  ordered  observations.  In  particular, 
they  study  the  maximin  efficient  linear  estimators  and  admissible  linear 
estimators  proposed  by  Birnbaua  and  Laska,  and  obtain  asymptotic  generalizations. 
They  demonstrate  that  within  a  large  class  of  linear  estimators  there  is  a 
unique  maximin  efficient  linear  estimator  for  general  families  of  densities. 


1 


157 


they  discuss  in  detail  the  special  case  in  Which  the  Badly  of  densities 
contains  the  logistic  and  double  exponential  distributions,  for  which  they 
find  the  maximin  efficient  linear  estimator  and  compare  it  with  the  best  con¬ 
vex  combination  of  the  individual  opthua  linear  estimators  and  with  a  Hodges - 
Lehmann  type  estimator  based  on  the  corresponding  maximin  rank  test.  Because 
of  computational  difficulties ,  they  look  for  a  vuhdn  efficient  estimator  in 
smaller  classes  of  linear  estimators  which  axe  easy  to  use,  including  the 
trimmed  mesns  and  linear  combinations  of  a  few  sample  percentiles.  They  show 
that,  under  suitable  regularity  conditions,  a  maximin  efficient  estimator  for 
each  of  these  classes  exists,  and  give  some  nunerical  exanples. 

F.  E,  Grubbs  (1969)  gives  an  expository  treatment  of  procedures  for 
determining  statistically  whether  the  highest  observation,  the  lowest  observa¬ 
tion,  the  highest  and  lowest  observations,  the  two  highest  observations,  the 
two  lowest  observations,  or  more  of  the  observations  in  the  sample  are  statisti¬ 
cal  outliers.  Included  are  statistical  formulae  and  tables  of  critical  values 
for  tests  of  significance  to  be  applied  in  detecting  outliers  in  single  samples, 
as  well  as  examples  of  their  application. 

Irvin  Guttman  and  D.  E.  Smith  (1969)  investigate  the  performance  of  three 
rules  for  dealing  with  outliers  in  small  samples  from  the  normal  distribution 
N(ti,o  )  when  the  primary  objective  of  sampling  is  to  obtain  an  accurate  estimate 

of  u.  They  assume  that  at  most  one  observation  in  the  sample  may  have  arisen 

2  2 

from  either  N(y+ao,o  )  or  N(y,(l+b)o  ),  and  measure  the  performance  of  each  rule 

in  terms  of  "Protection",  the  fractional  decrease  in  the  mean  square  error 

obtained  by  using  the  rule  when  such  an  observation  is  actually  present  in  the 

2 

sample.  Numerical  results  are  given  for  n<10  when  a  is  known,  but  only  for 
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w»3  when  o  is  tntaown. 

Douglas  M.  (1969)  derives  the  distributions,  aider  null  and  alter¬ 

native  hypotheses,  of  the  statistic  proposed  by  Queseriberry  §  David  (1961)  for 
detecting  the  presence  of  a  single  outlier. 

Louis  Alan  Jaecket  (1969)  considers  various  robust  estimates  of  a  location 
parameter.  Be  definds  three  types  of  location  estimators:  maximum  likelihood 
type  estimators,  linear  combinations  of  order  statistics,  and  estimators  derived 
from  rank  tests.  He  gives  sene  relationships  among  the  three  types,  and  shews 
that  Briber's  ainioax  result  applies  to  all  three.  He  considers  two  flexible 
estimation  procedures  in  which  the  observations  are  used  to  choose  an  estimator 
from  a  family  of  possible  estimators.  The  families  considered  include  the  trim¬ 
med  means  and  a  "weighted  median"  of  pairwise  means  derived  from  an  arbitrary 
rank  test. 

C.  L.  Narayana  and  M.  Subrahmanyam  (1969)  suggest  an  alternative  to  the 
method  of  least  squares  in  tho  theory  of  regression,  the  object  being  to  reduce 
the  computations  to  a  minimum  and  still  obtain  fairly  accurate  estimates  of 
the  slope  of  the  regression  line.  They  first  compute  the  slopes  of  the  lines 
joining  each  pair  of  data  points,  and  take  (i)  a  simple  average  of  these  slopes, 
(ii)  a  weighted  average  with  weights  equal  to  the  denominators  of  the  respective 
slopes ,  or  (iii)  a  weighted  average  with  weights  equal  to  the  squares  of  the 
denominators  of  the  respective  slopes.  They  prove  that  (iii)  is  equivalent 
to  method  of  least  squares,  and  show  by  examples  that  (i)  and  (ii)  are 
simpler  and  nearly  as  efficient. 

P.  V.  Rao  and  J.  I.  Thornby  (1969)  define  a  robust  point  estimator  of  the 
parameter  g  in  the  generalized  regression  model  yj-o+g^ (3)+s^ ,j»l,2,* ‘ *  ,n,  where 
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a  and  g  axe  unknown  parameters,  *  %  **•  real-valued  fractions  of  a 

real  variable  satisfying  suitable  conditions  and  z^z^,"' ,  are  independent 

identically  distributed  random  variables  having  a  distribution  function  belong¬ 
ing  to  a  specified  class.  An  important  special  case  of  this  model  is  the 
regression  nodal  obtained  by  setting  g^(g)  ■  tey  j«l,2,**‘,n,  where  the  x's 
are  known  constants.  For  this  case,  Aiichie  has  proposed  a  robust  estimator 
of  8  of  the  Hodges -Lehmann  type  and  Brown  and  Mood  have  proposed  a  median  esti¬ 
mator.  A  third  alternative  is  provided  by  the  estimator  proposed  by  the  authors, 
which  is  also  of  the  Hodges -Lenmann  type. 

Ram  Swaroop,  Kenneth  A.  West  and  Charles  E.  Lewis,  Jr.  (1969)  present  a 
statistical  technique,  and  the  related  computer  program,  for  identifying  the 
outliers  in  uni  oriate  data. 

Kei  Takeuchi  (1969)  proposes  an  estimator  of  the  location  parameter  of  a 
continuous  symmetric  distribution  which  is  a  linear  combination  of  the  order 
statistics  of  a  (fictitious)  subsample  of  size  k  drawn  randomly  from  the  order 
statistics  of  a  sample  of  size  n.  He  proves  that  this  estimator  is  asymptoti¬ 
cally  efficient  for  a  wide  class  of  distributions  satisfying  certain  regularity 
conditions,  and  shows  by  a  Monte  Carlo  study  that  a  modified  version  with 
symmetric  coefficients  attains  high  relative  efficiency  for  several  varieties 
of  distributions  even  for  small  sample  size  (n*10,15,?0  or  even  n»5) . 

Jerry  Thomas  (1969)  reports  the  results  of  a  Monte  Carlo  investigation 
of  the  effect  of  non-normality  on  the  distribution  of  Dixon’s  criteria  for 
detecting  outlying  observations.  He  reaches  the  conclusion  that  Dixoi's  criteria 
are  not  robust  and  may  yield  incorrect  decisions  for  skewed  distributions. 

John  E.  Walsh  (1969)  studies  the  sanple  size  n  required  for  approximate 
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independence  between  the  staple  median  and  the  largest  (or  smallest)  order 
statistic,  which  he  measures  fay-  tike  maximal  value  e  of  the  difference  between 
their  true  joint  probability  and  the  c  ^responding  value  assuring  independence. 
He  finds  that  the  following  inequality  holds:  2n+lvl+e'2/2»e2--l+. 0215/e2 
(e*.02). 

Takashi  Yanagawa  (1969)  proposes  a  new  robust  estimate  of  location  defined 
by  I^>p  »  Ii1<i2<«-<i  ^ij*  Xi2»*’*‘Xi^/(p  is  the  aean  of  the 

aedians,  (  p  jin  nwber,  of  p-tgples  (X.,  ,X.  ,***,Xj)  obtained  from  the  origi- 
nal  random  saaple  of  size  N.  He  conpaves  this  estimate  for  p»3  with  other  esti¬ 
mates  for  small  samples  from  normal  and  double  exponential  populations,  and 
finds  that  it  is  the  most  robust  in  a  class  including  the  sample  mean,  the  best 
linear  unbiased  estimate  for  the  double  exponential  distribution,  and  the 
Hodges  alternative  to  the  Hodges -Lehmann  estimate. 

V.  P.  Zelenen’kiy  (1969)  considers  methods,  based  on  statistical  decision 
theory,  for  the  exclusion  of  anomalous  measurements  of  random  processes.  He 
proposes  various  solutions  with  differing  amounts  of  information  about  the 
a  priori  statistical  characteristics  of  the  measured  processes  and  the  measure¬ 
ment  errors. 

V.  D.  Barnett  (1970)  studies  the  problem,  suggested  by  a  medical  example, 
of  fitting  a  linear  functional  model  with  replicated  observations  ~nd  inhomo¬ 
geneous  error  variances.  For  a  particular  error  structure  relevant  to  the 
example,  he  finds  maxi  man- likelihood  estimators  of  the  parameters  in  the  model 
(slope,  intercept  and  error  variances).  He  obtains  single  closed-form  expressions 
for  the  asymptotic  standard  errors  of  the  estimators,  even  though  the  estimators 
have  no  siaple  explicit  form  and  must  bo  evaluated  by  iterative  methods. 
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Alim  Birnbaua  md  Valerie  Mike  (1970)  develop  approximate  versions  of 
optimally  robust  Pitman-type  estimators  of  location,  and  shew  that  they  have 
full  asymptotic  efficiency  for  a  prototype  family  of  distributors.  By  mans 
of  a  Monte  Carlo  study  for  noderate  n  (20  to  100) ,  they  show  that  these  esti- 
aators  have  efficiencies  of  851  or  nore  for  normal,  logistic,  double  exponential, 
and  contaminated  normal  distributions. 

Ifcyne  A.  Puller  (1970)  investigates  simple  estimtors  of  the  mean  of  skewed 
populations  under  the  assumption  that  the  tail  of  the  distribution  is  well 
approximated  by  the  tail  of  a  Neibull  distribution.  He  considers  relatively  simple 
estimators,  in  particular  those  that  are  linear  in  the  order  statistics  or  that 
may  be  expressed  as  a  linear  function  of  the  order  statistic*  with  weights  that 
depend  on  a  preliminary  test.  Ihe  loss  in  efficiency  when  the  proposed  esti¬ 
mators  are  used  for  populations  for  which  the  sample  mean  performs  well  (such 
as  the  exponential)  is  very  small  relative  to  the  gain  for  heavily  skewed 
populations. 

J.  L.  Gastwirth  and  M.  L.  Cohen  (1970)  discuss  the  small-sample  behavior 
of  various  robust  linear  estimators.  They  find  that  for  sample  size  20  the 
variances  of  thes  j  estimators  are  well  approximated  by  asymptotic  theory.  If 
the  observations  are  assuaed  to  cccse  from  a  family  of  distributions  consisting 
of  the  Cauchy,  double  exponential,  normal,  contaminated  normal  and  logistic 
distributions,  then  the  asymptotically  maximin  efficient  estimator,  the  27-1/21 
trimmed  mean,  is  the  maximin  efficient  estimator  for  size  16  or  greater,  but 
its  minimum  relative  efficiency  for  samples  of  size  16  it  less  than  asymptotic 
theory  suggests.  If  the  Cauchy  distribution  is  deleted  frea  the  above  family, 
the  201  trimmed  mean  is  the  maximin  efficient  linear  estimator  for  all  sample 
sizes  studied. 


Harter  (1970)  collects  in  two  volumes  various  results  (tteory  and  tables), 
from  earlier  publications  authored  or  co-authored  by  hie,  on  order  statistics 
and  their  use  in  testing  and  estimation.  Volute  1  includes  tables  of  probability 
integral,  percentage  points  and  aonents  of  the  range  of  saaples  from  a  normal 
populations  [first  published  by  Harter  8  Clean  (1959)]  and  a  table  of  the 
probability  density  function  of  the  rang?  of  samples  from  a  normal  population 
[first  published  by  Harter  (1962)].  Volume  2  includes  tables  of  expected  values, 
variances,  standard  deviations,  probability  integral  and  percentage  points  of 
quasi-ranges  of  saqsles  from  a  normal  papulation  [previously  published  by 
Harter  (1958,1959,1962)]  and  for  the  probability  integral  and  percentage  points 
of  the  range  of  samples  from  a  rectangular  population  [previously  published  by 
Harter  (1961a)]. 

Huber  (1970)  considers  the  problem  of  studentizing  robust  estimates.  Let 
T  be  a  robust  estimate  of  a  (location)  parameter  e.  Huber  notes  that  little 
has  been  written  about  the  estimated  standard  deviation  (e.s.d.)  s(T)  appro¬ 
priate  for  T  or  about  the  st'*dentized  ratio  (T-e)/s(i),  beyond  the  general 
philosophy  outlined  by  Tukey  6  McLaughlin  (1963) :  Choose  the  T  in  the  numerator 
to  achieve  a  high  robustness  of  performance;  then  match  it  with  a  denominator 
s(T)  to  achieve  a  high  robustness  of  validity  over  a  broad  range  of  distribu- 
ticns.  As  a  result  of  his  investigation,  Huber  draws  the  following  conclusions: 
(lj  The  trimmed  mean,  scaled  by  the  Winserized  e.s.d.,  has  both  excellent  small 
sample  and  excellent  large  sample  properties;  since  it  is  also  easy  to  compute, 
it  caa  be  strongly  recommended  for  practical  use;  (2)  The  triaged  mean  and  a 
maxima#  likelihood  type  estimate  T  developed  by  Huber  behave  well  even  if  the 
underlying  distribution  fails  to  have  a  density,  but  the  corresponding  estimates 
s(T)  do  not. 


D.  6.  Kabe  (1970)  expresses  the  distributions  of  Dixon's  statistics  for 

the  rejection  of  outlying  observations  in  the  case  of  an  exponential  population 

in  terns  of  finite  series  of  Beta  functions,  from  which  the  probabilities  of 

rejection  of  suspected  outliers  can  be  easily  calculated  on  a  desk  calculator, 

thus  naking  tables  such  as  those  of  Likes  (1966)  unnecessary. 

R.  M.  Loynes  (1970)  obtains  bounds  on  the  asymptotic  relative  efficiency, 

as  an  estimator  of  the  central  value  of  a  symmetric  distribution,  of  the  power 

mean  of  order  q  with  respect  to  the  power  mean  cf  order  p,  where  l<p<q,  which 
2  2 

are  (p-1)  / (q-1)  and  «,  both  bounds  being  the  best  possible.  If  the  under¬ 
lying  distribution  is  unimodal,  the  lower  bound  can  be  improved  to  (2p-l)/(2q-l) , 
again  the  best  possible. 

C.  Singh  (1970)  obtains  the  probability  integral  of  the  range  of  samples 

from  a  population  whose  distribution  can  bo  represented  by  the  first  four  terms 

of  an  Edgeworth  series.  He  tabulates  the  numerical  values  of  the  corrective 

functions  arising  because  of  nonnorm&lity.  He  compares  the  new  theoretical 

results  with  the  earlier  results  of  various  authors,  including  Pearson  (1950), 

Cox  (1954),  David  (1954)  and  Singh  (1967).  He  concludes  that,  for  Edgeworth 

type  populations,  the  effect  of  parental  skewness  on  the  probability  integral 

and  percentage  points  of  the  range  is  comparatively  small  for  samples  of  size 

n<5,  but  as  n  becomes  larger  this  effect  becomes  as  prominent  as  that  of  kurtosis 

2  2 

except  in  the  lower  tail  of  the  distribution.  Ihe  effect  of  ^  is 

almost  always  opposite  to  the  effect  of  A^  ■  a^-3  ■  3,  and  they  sometimes 

counterbalance  ep.ch  other  so  that  the  resultant  is  close  to  the  normal  theory 
value. 

Paul  Switzer  (1970)  discusses  various  methods  of  obtaining  robust  estimators 
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of  a  location  parmetar  9.  Be  proposes  choosing  (say)  throe  reasonable  candidate 
estimators  for  which  ncn-paraaetric  estimates  of  the  standard  error  are  avail* 
Ale,  then  actually  computing  each  for  the  data  at  hand  and  using  the  one  which 
has  Ae  smallest  estimated  standard  error.  One  general  procedure  is  to  assuae 
that  the  staple  of  size  H  can  be  divided  into  K  blocks  of  equsl  size  n-N/K, 
coapute  the  tiffiaftje«inttes  for  each  blocM*,  k-l,2,**\  K;  i-1,2,3;  then  take 
the  overall  to  be  the  average  of  the  block  estimates,  and  esti- 

nate  its  standard  error  by  S^»  dioose  that  for  which 

the  estimated  standard  error  S.  is  smallest.  As  a  special  case  he  considers  a 
sample  size  N  divisible  by  6;  he  divides  the  data  into  K«N/6  equal  groups  (at 
random).  In  each  group  k(of  size  6) ,  he  computes,  as  the  candidate  estimates, 

*1  “  ^3+  ^2  “  ^2*  X5^2*  ^3  *  0^*  ^)/2»  where  the  X's  are  the 

ordered  values  in  the  group.  He  reports  the  results  of  a  Monte  Carlo  study  .of 
this  procedure  applied  to  samples  of  size  N*30»60,  and  120  drawn  from  a  short¬ 
tailed  (uniform)  distribution,  a  normal  distribution,  and  a  long- tailed  (con¬ 
taminated  normal)  distribution.  The  results  are  as  expected,  with  strongly 
favored  by  the  short-tailed  distribution  and  strongly  disfavored  by  the  long¬ 
tailed  distribution. 

Allan  Birhbaum,  Eugene  Laska  and  Morris  Meisner  (1971)  determine  maximin- 
efficient  linear  unbiased  estimators  (MWJEs) ,  and  their  efficiencies ,  for  ordered 
samples  of  sizes  5(5)20  from  a  family  of  nine  distributions.  Since  these 
distributions  admit  simple  orderings  such  that  the  MLUE  over  any  subset  is  just 
the  MLUE  over  the  extreme  pair  in  the  ordered  subset,  they  are  able  to  summarize 
the  results  compactly.  They  also  discuss  relations  to  other  estimators. 

F.  C.  Duckworth  (1971)  demonstrates  that  the  standard  method  of  least  squares* 
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is  inimitable  for  analyzing  certain  types  of  data,  either  because  it  fails  to 
appreciate  the  strong  dependence  of  the  prediction  error  on  the  independent 
variable  or  because  it  is  unable  to  allow  for  any  error  introduced  by  the  fitting 
of  an  inappropriate  equation.  He  contends  that  a  seed-subjective  graphical  technique 
my  often  give  more  useful  Tesults ,  though  he  admits  that  such  a  method  also  has 
disadvantages  -  -besides  being  partially  subjective,  it  is  also  rather  cumbersome 
and  not  easily  computer! zable,  and  there  is  no  rigorous  method  of  determining 
the  error  of  estimate. 

Arthur  L.  Edwards  (1971)  presents  a  set  of  computer  subroutines ,  FITTER, 

RKGUS,  and  FUNNY,  written  for  finding  linear  least-squares  fits  of  weighted 
tabular  data  with  any  of  several  functional,  fores,  and  for  evaluating  the  final 
function  at  specified  values  of  the  independent  variable.  Provision  is  made  for 
several  functional  forms,  including  power  series  in  the  independent  variable 
or  its  reciprocal,  with  or  without  a  constant  term,  and  other  functional  forms 
may  be  added  as  desired. 

M.  V.  Johns,  Jr.  (1971)  develops  a  sequence  of  estimators,  indexed  by  an 
integer- valued  parameter  k,  which  are  .isymptotically  efficiency-robust  in  the 
sense  that,  for  any  k,  the  corresponding  estimator  is  consistent  and  asymptoti¬ 
cally  normally  distributed  (as  the  sample  size  n  increases)  for  any  F  in  a  large 
stfcset.  S1  of  the  class  of  symmetric  distributions  and,  for  large  k,  the  corresponding 
estimator  is  (nearly)  best  asymptotically  normal  for  all  F  e  The  simplest 
nan-trivial  estimator  in  the  proposed  sequence  (corresponding  to  k»2)  exhibits 
quite  high  efficiencies  for  small  to  moderate  sample  sizes  (n»10,20,40)  for  a 
collection  of  distributions  comprising  the  normal,  the  Cauchy,  the  logistic,  the 
double  exponential ,  and  the  101  contaminated  normal. 


166 


Elmer  E.  B—renga  and  R.  G.  Burdick  (1971)  describe  a  stepwise  computer 
procedure  for  identifying  and  setting  aside  extreme  values  from  sets  of  data 
with  bias  or  subjectivity  on  the  part  of  the  analyst.  Since  truncation 

of  a  basically  noraal  distribution  causes  a  downward  bias  in  the  estimated  vari¬ 
ance,  a  graphical  method  is  provided  to  compensate  for  the  bias. 

Kan  Swaroop  and  Killian  R.  Winter  (1971)  present  a  statistical  technique 
and  the  necessary  computer  program  for  editing  multivariate  data.  The  technique 
is  especially  useful  when  large  quantities  of  data  are  collected  and  the  editing 
must  be  performed  automatically.  One  task  in  the  editing  process  is  tto  identifi¬ 
cation  of  outliers  which  deviate  markedly  from  the  rest  of  the  sample.  The 
technique  presented,  a  multivariate  analog  of  the  univariate  technique  of  Swaroop, 
Nest  $  Lewis  (1969) ,  considers  the  statistical  linear  relationship  between  the 
variables  in  identifying  the  outliers.  It  is  assured  that  the  data  are  from  a 
multivariate  normal  population  and  that  the  sample  size  exceeds  the  number  of 
variables  by  at  least  two. 

John  Caso  (1972)  presents  the  results  of  an  extensive  literature  search 
in  the  area  of  robust  estimation  techniques.  Be  gives  a  descriptive  analysis 
of  several  robust  estimators  of  the  location  parameter  of  symmetric  distributions. 
These  estimators,  chosen  because  they  are  computationally  and  theoretically 
tractable  and  can  be  easily  understood  by  a  practitioner,  are  the  trimmed  and 
Winsoiized  means,  the  Hodges -Lehmann  estimator,  Hubei-'s  estimator,  Hogg's  esti¬ 
mator  and  Switzer’s  estimator.  Caso  also  reports  the  insults  of  a  Monte  Carlo 
study,  based  on  4200  samples  each  of  size*  12  and  24  from  five  symmetric  prob¬ 
ability  distributions  (rectangular,  triangular,  noimal,  contaminated  normal  and 
double  exponential) ,  of  the  efficiency  of  the  robust  estimators  relative  to  the 
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best  estimator  for  the  distribution  under  consideration.  The  results  show  that 
the  robust  estiaators  provide  a  higher  guaranteed  efficiency  than  the  best 
estimator  for  any  particular  distribution  in  the  family. 

Eisenhart  (1972)  traces  the  development  of  the  concept  of  the  best  mean 
of  a  set  of  measurements  from  antiquity  to  the  present  da/.  He  reports  instances 
of  the  use  of  the  mode  by  the  Greeks  as  far  back  as  the  fifth  century  B.  C.  and 
of  the  midrange  by  the  Arabs  arouid  1000  A.  D  The  median  and  the  arithmetic 
mean  apparently  came  into  use  much  later,  around  1600.  By  the  early  nineteenth 
century,  the  principle  of  the  arithmetic  mean  had  become  widely  (though  not 

tniversally  accepted) ,  but  it  met  its  downfall  at  the  hands  of  Poisson  (1824) , 

2 

who  pointed  out  that  for  the  Cauchy  distribution  with  p.d.f.  f(x)«  l/»(l+x  ), 
"*kx<+*,  the  arithmetic  mean  has  the  same  distribution  (for  which  the  moments 
do  not  exist)  as  a  single  observation.  The  author  also  comments  an  the  theory 
of  statistical  estimation  from  the  time  of  Laplace,  D.  Bernoulli,  and  Gauss  to 
the  present,  and  closes  with  some  comments  on  modem  robust  estimation,  which 
he  says  arose  out  of  World  War  II  arguments  as  to  whether  mean  deviation  or 
standard  deviation  is  a  better  measure  of  dispersion  in  gunnery  and  bombing 
situations. 


6.  CONCLUSIONS  AN D  RECCNMENQA.TIOi'JS 

1.  The  best  choices  of  measures  of  central  tendency  and  dispersion  and  of 
methods  for  fitting  linear  (or  nonlinear)  regression  equations  depend  upon  the 
error  law,  i.  e.  the  distribution  of  the  errors  or  residuals.  For  the  three 
common  laws  of  error  shown  on  the  accompanying  graph  (Figure  1) ,  the  best 
choices  are  as  follows: 
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FIG  I  PROBABILITY  DENSITY  FUNCTIONS  FOR  THREE  COMMON  LAWS  OF  ERROR 
STANDARDIZED  (MEAN»0,  STANDARD  DEVIATION ”1,  AREA  UNDER  CURVE-1) 
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“See  glossary  of  code  letters  following  list  of  references. 

^Ttkan  froa  the  medio,  not  fro*  the  arithmetic  mean. 

This  ccaclusioa  follows  from  the  theory  of  paver  means,  as  developed  by  Fechner 
(1874),  Brusa  (1°38)  and  others.  The  results  are  corroborated  by  the  following 
ratios  of  variances  and  asymptotic  variances  of  median,  mean,  and  midrange  for 
the  specified  distributions: 


LI:  ,  \  ;  u:  ±V*Ll M.  .  '  ,  W:  V«r  Q*)  .  « _ 

A  Var  (AM)  A  Var  (AM)  2  Var  GW)  (N+l)  (N+2) 

2.  For  any  ether  member  of  the  exponential  family  of  error  laws,  with  prob¬ 
ability  density  function  of  the  fora  f(x)-  the  oest  choice  of 

measure  of  central  tendency  is  the  power  aero  of  order  p,  with  which  are 
associated  the  Measure  of  dispersion  which  is  the  p^  root  of  the  mean  of  the 
ah' '  Je  p**1  powers  of  deviations  from  the  power  mean  of  order  p  and  the  method 
of  fitting  the  regression  equation  which  minimizes  that  measure  of  dispersion 
of  the  residuals.  When  p  is  not  an  integer  or  is  an  integer  greater  then  2, 
this  does  not  lead  to  simple  procedures. 

3.  For  error  laws  which  are  not  meat*  rs  of  the  exponential  family,  it  Is  not 
clear  what  the  best  choices  are. 

4.  For  the  sake  of  simplicity,  it  is  reccemended  that  the  choice  be  restricted 
to  the  tlnee  sets  in  paragraph  (1)  above.  For  symmetric  distributions  (whose 
standardized  third  mcment  about  tiu  mean,  a3«u3/a3,  is  equal  to  zero),  the  vaiue 
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of  the  standardized  fourth  moment  about  the  mean,  can  be  used  as  a 

criterion  to  decide  which  of  the  three  sets  of  choices  should  be  made.  For 
otj  near  3.0,  the  value  for  the  normal  distribution,  one  would  obviously  want 
to  make  the  choices  which  are  associated  with  that  distribution.  For  those  with 
substantially  higher  values  of  <*4,  one  would  expect  the  choices  associated  with 
Laplace's  first  distribution  (for  which  a^-6.0)  to  give  better  results.  Similar¬ 
ly,  for  Lhose  with  substantially  lower  values  of  <*4,  one  would  expect  the  choices 
associated  with  the  uniform  distribution  (for  which  c^-1.8)  to  give  better  results. 
On  the  basis  of  theoretical  results  of  Rider  (1957)  and  empirical  results  of 
Hogg  (1967) ,  it  is  recommended  tha«.  the  first  set  of  choices  in  paragraph  (1) 
above  be  made  for  a4>3.8,the  second  set  for  2.2  £a4*3.8,  and  third  set  for 
<  2.2.  If,  as  will  usually  be  the  case,  the  population  value  of  is  taiknowu, 
the  corresponding  sample  value  should  be  used. 

5.  All  three  of  the  above  procedures  are  adversely  affected  by  asymmetry  in 
the  distributions  of  errors  or  residuals,  which  occurs  if  positive  and  negative 
errors  of  the  same  magnitude  are  not  equally  likely.  If  the  deviations  front 
the  chosen  measure  of  central  tendency  in  a  one-dimensional  array  or  the  devia¬ 
tions  from  linear  or  nonlinear  regression  indicate  asymmetry,  as  evidenced  by 

a  sample  value  o£  Cj  which  differs  significantly  from  zero  (the  standard  error 
of  o3  is  V&/n) »  consideration  should  be  given  to  transforming  the  data  so  as  to 
reduce  the  asymmetry  as  much  as  possible,  and  titan  analyzing  the  transformed 
data  instead  of  the  original  data. 

6.  All  three  of  the  above  procedures  are  also  adversely  affected  by  the  pre¬ 
sence  of  spurious  observations  which  may  have  resulted  fro*  gross  bliarders  or 
some  undetected  change  in  the  quantity  «.  *ured  or  in  the  conditions  of 


measurement.  The  adverse  effect  is  most  pronounced  in  the  case  of  the  pro¬ 
cedure  which  is  appropriate  for  the  uniform  distribution,  which  depends  heavily 
on  the  extreme  observations ,  and  least  so  for  the  procedure  which  is  appropriate 
for  Laplace's  first  distribution,  with  the  procedure  appropriate  for  the  normal 
distribution  occupying  an  intermediate  position.  If  there  is  any  reason  to 
suspect  the  presence  of  spurious  observations,  the  procedure  based  on  the  uni¬ 
form  distribution  should  never  be  used,  and  that  based  an  the  normal  distribution 
should  be  used  only  after  applying  one  of  the  modem  criteria  for  the  rejection 
of  outliers,  most  of  which  are  based  on  normal  theory.  Outliers  at  one  extreme 
only  may  produce  false  indications  of  asymmetry  which  vanish  when  they  are 
rejected.  Outliers  at  both  extremes  are  likely  to  yield  high  values  of  a^, 
which  would  lead  to  use  of  the  procedures  based  on  Laplace  *  s  first  distribution; 
after  they  have  been  rejected,  procedure*  based  on  the  normal  distribution  may 
be  appropriate.  In  such  cases,  it  is  often  difficult  to  decide  whether  the 
extreme  observations  are  spurious  or  whether  they  are  genuine  observations  from 
a  distribution  (scch  as  Laplace's  first)  with  a  high  value  of  a^. 

7.  Much  cm  be  learned  by  plotting  the  data.  In  the  case  of  a  one-dimensional 
array,  one  can  form  a  preliminary  impression  as  to  skewness  (as  measured  by  a^) , 
kurtosis  (as  measured  by  a^),  and  the  presence  of  spurious  observations.  In  the 
case  of  two  variables,  one  can  form  a  preliminary  impression  as  to  whether 
the  relation  is  linear  or  nonlinear;  if  the  latter,  one  can  get  some  idea  as 
to  the  type  of  curvilinear  relation  that  should  be  fitted.  After  the  regression 
equation  has  been  fitted,  the  residuals  should  be  plotted  against  the  independent 
variable.  The  presence  of  any  systematic  pattern  may  indicate  that  the  wrong 
type  cf  relation  has  been  fitted.  Mention  sheila  be  made  of  nvo  systematic  patterns 


that  nay  occur:  (1)  A  correlation  between  the  independent  variable  X  and  the 
magnitude  of  the  residuals  |Y-Y]  may  indicate  the  need  to  transform  the  data 
before  analysis  and  find  the  regression  on  X,  not  of  Y,  but  of  log  Y,  Y*5,  or 
YA;  and  (2)  If  the  residuals  tend  to  be  of  one  sign  for  extreme  values  of 
the  independent  variable  and  of  the  opposr-e  sign  for  intermediate  values,  this 
may  indicate  the  need  to  fit  a  curvilinear  instead  of  a  linear  relation.  If  no 
such  pattern  exists ,  the  residuals  may  then  be  treated  as  a  univariate  array 
and  analyzed  accordingly  (for  skewness  significantly  different  from  zero,  kur- 
tosis  for  which  the  procedure  used  is  inappropriate,  or  the  presence  of  spurious 
observations  which  may  not  have  been  detected  on  the  initial  two-dimensional 
plot).  If  any  of  these  conditions  is  discovered, appropriate  steps  to  alleviate 
it  can  be  taken  and  the  resulting  data  reanalyzed. 

8.  If  a  function  y  «  f(x)  has  been  confuted  (or  measured)  quite  accurately 
and  rounded  values  have  been  tabulated,  the  distribution  of  errors  in  the 
tabular  values  is  uniform  between  -0.5  and  +  0.5,  in  units  of  the  last  digit 
retained.  This  should  be  borne  in  mind  in  approximating  the  tabular  values  by 
a  linear  or  nonlinear  regression  equation. 
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A  JiMtate's  talcs  (far  rtjciiai  of  ontliets) 

nerve  (ell  types) 

1C  kttMi's  criteria  (for  rejection  of  ootliezs) 

IT  best  two  (oat  of  tkree) 

CC  Chnaet's  criterion  (far  rejection  of  outliers) 

04  Ceudgr's  wttito d  (of  ixtexpolstios) 

CT  (Bliss)-Codtrm  T&ejr  criterion  (fox-  rejection  of  ootliezs) 

CD  CUocmi’s  criterion  (for  rejection  of  ootliezs) 

M  discard  tmifes  [trinced  noons] 

DC  Hum's  criterion  (for  rejection  of  ootliezs) 

ID  discoid  deviation 
01  dispersion  (aeasvrcs  of) 

EA  equal  areas  (infer  joint  p.d.  curve)  [Laplace's  'test  advantageous  nethod"] 
EM  Edgeworth's  nodificstian  (of  Stone's  second  criterion) 

El  extremes  (largest  and  saallest  values  in  saeple) 

PC  Ferguson's  criterion  (for  rejection  of  outliers) 

GA  Gestsirth  estimators 

GC  Glaisher's  criterion  (for  rejection  of  outliers) 
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HD  Haber's  estxretor 

IC  Iudu's  criterion  (for  rejection  of  outliers) 

K  interquartile  range 
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ID  largert  (absolute)  deviation 

LF  least  (absolute  son  of  )  first  (powers)  [Laplace's  'Method  of  situation"] 

ID  least  amber  of  deviations  (least  soa  of  zero  powers) 

lit  linear  regression 

IS  least  squares 

1M  linearly  ^pigited  neans 

HA  wetbod  of  averages 

MC  Merrinan's  criterion  (for  rejection  of  outliers) 
lZ>  nedian 

MK  McKay's  criterion  (for  rejection  of  outliers) 


ML  saasMM  ?fp*li^ow£ 

W.  atrefwr  m£?ccS  Erisijr-Be  mrxBJK  rsm?j 
30 

50  acrisHacTri  ?«►  aaexsgs 
ME.  maacje 

HI  aggjja  ea  tag  otasr  csaer  statistics 
»  acltiraxigsg  MBs*  criterion  (for  tejectisa  of  caliers) 
HZ  3itragjpiies  criterion  {Sot  rejection  of  cstliers) 

3ft  mxi.BJ?  (ssso  of)  fporta  (powers  of  p.<L£.  of  errors) 

SC  5 air’s  criteriao  (far  rejection  of  corners) 

Hi  5eMcaKJ*s  aetzaod  (of  treating  outliers) 

32.  nonlinear  regression 
S3  IMr-SniTastara.  Method  (of  curve  fitting) 

OS  order  statistics 

PA  plus  approxiaative  artbods  favst  gguroxEaatfve  retrod] 

PC  Peirce’s  criterion  (far  rejection  of  outliers) 

Hi  pcifer  Beans 
QA  quadratic  average  (scan) 

(?)  quartile  deviaticr  [stui-  interquartile  range] 

Qi  quasi -aidrange  fquasi-izeciaii] 

Qfi  quantiles 
(JR  quasi-range 
RA  range 

RC  Rotme's  criteria!  (for  rejection  of  outliers) 

RL  robust  estimators  of  location 
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SC  S tnc’s  (first)  erf  ten  ae  (fer  rejectim  of  cgtliers) 

S3  sf'gKkrft  derirticr.  fesr  Ttrijoce  3  QEZQ^j 

5 1  Staart?s  method  (czftexirc)  (for  rejactim  of  rotiirrs) 
Sk.  sadgegg 

ST  StuSentTs  mLe  (for  rejection  of  catHers) 

S¥  Seltzer  Ts  estftrr 

52  Striae’s  second,  criterim.  (for  rejectim  of  outliers) 

TC  Tippetts  criterim  (for  rejectim  of  oatliers) 

TH  theory  (of)  errors 

TF  Tt4ay*s  RKR-HKM  procedure 

T3  Tcpsoe- Jensen  criterim  (for  rejectim  of  ootliers) 

T*C  Tbopscn*s  method  (criterim)  (for  rejectim  of  outliers) 
ID  trseaerit  of  outlying  ohssrratioos 
TC  Vallicrrs  criterim  (for  rejectim  of  ootliers) 

1R  weighted  arersge 

TC  lfrigfrt-s  criterim  (for  rejectim  of  outliers) 

TC  l&ight-Hayford  (criterion)  (for  rejectim  of  ootliers) 

IH  ISssorizatim 
181  Iftnsorizcd  aeans 

*R  Irtish's  role  (criterim)  (for  rejection  of  outliers) 

YE  Yaaagawa’s  estimator 
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